1/Sigma Term Weighting Scheme for Sentiment Analysis

Large amounts of data on the web can provide valuable information. For example, product reviews help business owners measure customer satisfaction. Sentiment analysis classifies texts into two polarities: positive and negative. This paper examines movie reviews and tweets using a new term weighting scheme, called one-over-sigma (1/sigma), on benchmark datasets for sentiment classification. The proposed method aims to improve the performance of sentiment classification. The results show that 1/sigma is more accurate than the popular term weighting schemes. In order to verify if the entropy reflects the discriminating power of terms, we report a comparison of entropy values for different term weighting schemes.





References:
[1] Wang, T., Cai, Y., Leung, H.F., Cai, Z. and Min, H., 2015, November. Entropy-based term weighting schemes for text categorization in VSM, In 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 325-332). IEEE.
[2] Zhang, P.,Wang, Y.,Wang, J., Zeng, X. and Wang, Y., 2017, March. An improved term weighting scheme for sentiment classification, In 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) (pp. 462-466). IEEE.
[3] Ravi, K. and Ravi, V., 2015. A survey on opinion mining and sentiment analysis: tasks, approaches and applications, Knowledge-Based Systems, 89, pp.14-46.
[4] Ismail, H., Harous, S. and Belkhouche, B., 2016. A Comparative Analysis of Machine Learning Classifiers for Twitter Sentiment Analysis, Res. Comput. Sci., 110, pp.71-83.
[5] Liu, B., 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), pp.1-167.
[6] [6] Pang, B. and Lee, L., 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, arXiv preprint cs/0409058.
[7] Untawale, T.M. and Choudhari, G., 2019, March. Implementation of Sentiment Classification of Movie Reviews by Supervised Machine Learning Approaches, In 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC) (pp. 1197-1200). IEEE.
[8] Tang, D., Qin, B., Wei, F., Dong, L., Liu, T. and Zhou, M., 2015. A joint segmentation and classification framework for sentence level sentiment classification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(11), pp.1750-1761.
[9] Zhou, J., Huang, J.X., Chen, Q., Hu, Q.V., Wang, T. and He, L., 2019. Deep learning for aspect-level sentiment classification: Survey, vision, and challenges, IEEE Access, 7, pp.78454-78483.
[10] Bai, X., 2011. Predicting consumer sentiments from online text. Decision Support Systems, 50(4), pp.732-742.
[11] Hu, M. and Liu, B., 2004, August. Mining and summarizing customer reviews, In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177).
[12] Alshaher, H. and Xu, J., 2020, March. A New Term Weight Scheme and Ensemble Technique for Authorship Identification, In Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis (pp. 123-130).
[13] Deng, Z.H., Luo, K.H. and Yu, H.L., 2014. A study of supervised term weighting scheme for sentiment analysis, Expert Systems with Applications, 41(7), pp.3506-3513.
[14] Nguyen, T.T., Chang, K. and Hui, S.C., 2011, July. Supervised term weighting for sentiment analysis, In Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics (pp. 89-94). IEEE.
[15] Jianqiang, Z. and Xiaolin, G., 2017. Comparison research on text preprocessing methods on twitter sentiment analysis, IEEE Access, 5, pp.2870-2879.
[16] Prabowo, R. and Thelwall, M., 2009. Sentiment analysis: A combined approach, Journal of Informetrics, 3(2), pp.143-157.
[17] Boiy, E. and Moens, M.F., 2009. A machine learning approach to sentiment analysis in multilingual Web texts, Information retrieval, 12(5), pp.526-558.
[18] Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using distant supervision, CS224N project report, Stanford, 1(12), p.2009.
[19] Maas, A., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y. and Potts, C., 2011, June. Learning word vectors for sentiment analysis, In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 142- 150).
[20] Kantor, P.B. and Lee, J.J., 1986, September. The maximum entropy principle in information retrieval, In Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 269-274).
[21] Kuang, Qiaoyan, and Xiaoming Xu. Improvement and application of TF• IDF method based on text classification, 2010 International Conference on Internet Technology and Applications. IEEE, 2010.
[22] Dredze, Mark. Multi-Domain Sentiment Dataset (Version 2.0). Johns Hopkins University, 23 Mar. 2009, https://www.cs.jhu.edu/~mdredze/datasets/sentiment/.
[23] Narr, Sascha, Michael Hulfenhaus, and Sahin Albayrak. "Language-independent twitter sentiment analysis." Knowledge discovery and machine learning (KDML), LWA (2012): 12-14.