Classification of Political Affiliations by Reduced Number of Features

By the evolvement in technology, the way of expressing opinions switched direction to the digital world. The domain of politics, as one of the hottest topics of opinion mining research, merged together with the behavior analysis for affiliation determination in texts, which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 were constituted by Linguistic Inquiry and Word Count (LIWC) features were tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that the “Decision Tree”, “Rule Induction” and “M5 Rule” classifiers when used with “SVM” and “IGR” feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “Function”, as an aggregate feature of the linguistic category, was found as the most differentiating feature among the 68 features with the accuracy of 81% in classifying articles either as Republican or Democrat.




References:
[1] M. Kaschesky, S. Pavel, and B. Guillaume, "Opinion Mining in Social
Media: Modeling, Simulating, andVisualizing Political Opinion
Formation in the Web," 2012.
[2] Y. Inbar and L. Joris, "Perspectives on Psychological Science," 2012.
[3] J. W. Pennebaker, R. E. Boot, and M. E. Francis, "Linguistic inquiry
and word count: LIWC2007 - Operator's manual," Austin, TX, 2007.
[4] R. Inglehart and C. Welzel, "Modernization, Cultural Changeand
Democracy", Cambridge UK, 2005.
[5] M. Griffiths, "E-citizens: Blogging as democratic practice", 2004.
[6] Y. Fang, L. Si, N. Somasundaram, and Z. Yu, "Mining Contrastive
Opinions on Political Texts using Cross-Perspective Topic Model," in
ACM, 2012, pp. 1-15.
[7] D. W. Van,”Shockmd: a neurostimulating blog”. (Online).
http://www.shockmd.com/2009/12/16/personality-traits-and-politicalattitude/
, 2009.
[8] S. Alan Gerber, A. Gregory Huber, David Doherty, and Conor M.
Dowling, "Personality and Political Attitudes: Relationships across Issue
Domains and Political Context", vol. 104(1), 2010, pp. 111-133.
[9] F. Mairesse, M. A. Walker, M. R. Mehl, and R. K. Moore, "Using
Linguistic Cues for the Automatic Recognition of Personality in
conversation and text," Journal of Artificial Intelligence Research, 2007
pp. 457-500.
[10] S. Marina and P.W.D. Robert, "Combining feature subsets in feature
selection," Multiple classifier systems, 2005 pp. 165-175.
[11] E. Ozbilen, "improving text categorization performance by combining
feature selection methods," Istanbul, 2008.
[12] G. Forman, "An extensive empirical study of feature selection metrics
for text classification," Journal of Machine Learning Research, 2003 vol.
3, pp. 1289–1305. [13] E. R. Dougherty, J. Hua, and C. Sima, "Performance of Feature
Selection Methods ," Current Genomics, vol. 10(6), 2009, pp. 365–374.
[14] J. Lee, M. Zhou, and X. Liu, "Detection of non-native sentences using
machine-translated training data," in Proceedings of the 2007 Human
Language Technology Conference of the North American Chapter of the
Association for Computational Linguistics, 2007, pp. 93-96
[15] K. T Kotani, Yoshimi, and M. Uchida, "Automatic Classification of
Texts Written by Learners of English as a Foreign Language based on
Linguistic Features and Learner Features," 2013, pp. 6305-6314.
[16] J. W. Pennebaker and L. A. King, ""Linguistic styles: Language use as
an individual difference"," Journal of Personality and Social Psychology,
1999, vol. 77, pp. 1296-1312.
[17] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic Inquiry”
Mahwah, NJ, USA: Erlbaum Publishers, 2001. (Online).
http://www.erlbaum.com
[18] M. Coltheart, "The MRC psycholinguistic database," The Quarterly
Journal of Experimental Psychology, 1981, vol. 33(4), pp. 497-505.
[19] M. Pennacchiotti and A.M. Popescu, "A Machine Learning Approach to
Twitter User Classification," in ICWSM 11, 2011, pp. 281-288.
[20] B. L. Monroe, M. P. Colaresi, and K. M. Quinn, "Fighting words:
Lexical feature selection and evaluation for identifying the content of
political conflict," in Political Analysis, 2008, vol. 16(4), pp. 372-403.
[21] F. Heylighen and J. M. Dewaele, "Variation in the contextuality of
language: an empirical measure", Context in Context, Special issue of
Foundations of Science, 2002, vol. 7(3), pp. 293-340.
[22] M. R Mehl, S. D. Gosling, and J. W. Pennebaker, ""Personality in its
natural habitat: Manifestations and implicit folk theories of personality
in daily life", vol. 90, pp. 862-877, 2006.
[23] T. Yarkoni, "Personality in 100,000 Words: A Large-Scale Analysis of
Personality and Word Use among Bloggers," National Institute of Health
Public Access, 2010, pp. 1-23.
[24] C. Moral, A. d. Antonio, R. Imbert, and J. Ramirez, "A survey of
stemming algorithms in information retrieval," in Information Research,
2014vol. 19(1).
[25] D. Maynard and A. Funk, "Automatic detection of political opinions in
tweets," 2010, pp. 1-12.
[26] B. Liu, M. Hu, and J. Cheng, "Analyzing and comparing opinions on the
web," in Proceedings of the 14th international conference on World
Wide Web, 2005, pp. 342–351.