An Enhanced Support Vector Machine-Based Approach for Sentiment Classification of Arabic Tweets of Different Dialects

Arabic Sentiment Analysis (SA) is one of the most common research fields with many open areas. This paper proposes different pre-processing steps and a modified methodology to improve the accuracy using normal Support Vector Machine (SVM) classification. The paper works on two datasets, Arabic Sentiment Tweets Dataset (ASTD) and Extended Arabic Tweets Sentiment Dataset (Extended-ATSD), which are publicly available for academic use. The results show that the classification accuracy approaches 86%.





References:
[1] Gehad S. Kaseb, Mona F. Ahmed. Arabic Sentiment Analysis approaches: An analytical survey. International Journal of Scientific & Engineering Research, Volume 7, Issue 10, October-2016 712 ISSN 2229-5518.
[2] Mahmoud Nabil, Mohamed Aly and Amir F. Atiya. ASTD: Arabic Sentiment Tweets Dataset. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2515–2519, Lisbon, Portugal, 17-21 September 2015.
[3] Kaseb, Gehad S., and Mona F. Ahmed. "Extended-ATSD: Arabic Tweets Sentiment Dataset" Journal of Engineering and Applied Sciences 14.14 (2019): 4780-4785.
[4] M. Thelwall, Heart and soul: sentiment strength detection in the social web with sentistrength, in: Proceedings of the CyberEmotions, 2013, pp. 1–14.
[5] A. Rabab’ah, M. Al-Ayyoub, Y. Jararweh, M. Al-Kabi, Evaluating sentistrength for Arabic sentiment analysis, in: 2016 7th International Conference on Computer Science and Information Technology (CSIT), 2016, pp. 1–6.
[6] Vdurmont. 2016. The missing emoji library for java. https://github.com/vdurmont/emoji-java.
[7] El-Beltagy, Samhaa R., 2016. NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic. In proceedings of LREC 2016, Portorož, Slovenia.
[8] Open Issues in the Sentiment Analysis of Arabic Social Media: A Case Study, Samhaa R. El-Beltagy
[9] ANERGazet available at: http://users.dsic.upv.es/grupos/nle/?file=kop4.php
[10] Benajiba, Y., Rosso, P., Bened´ı Ru´ız: ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Proceeding of CICLing-2007, Mexico. Lecture Notes in Computer Science 4394, Springer-Verlag
[11] Arabic Stop Words list available at: https://www.arabeyes.org/%D9%85%D8%B3%D8%AA%D8%A8%D8%B9%D8%AF%D8%A7%D8%AA_%D8%A7%D9%84%D9%81%D9%87%D8%B1%D8%B3%D8%A9
[12] Arabic Stop Words list available at: https://sites.google.com/site/kevinbouge/stopwords-lists/stopwords_ar.txt?attredirects=0&d=1
[13] Arabic stop words list available at: http://www.ranks.nl/stopwords/arabic
[14] Arabic country and capital names available at: http://www.nationsonline.org/oneworld/countrynames_arabic.htm
[15] Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, and Wassim El-Hajj. A large scale arabic sentiment lexicon for arabic opinion mining. ANLP 2014, page 165,2014.
[16] Buckwalter to Unicode converter available at: http://www.comp.leeds.ac.uk/andyr/software/
[17] ElSahar, Hady, and Samhaa R. El-Beltagy. "Building large arabic multi-domain resources for sentiment analysis." International Conference on Intelligent Text Processing and Computational Linguistics. Springer International Publishing, 2015.
[18] Lexicons github website: https://github.com/hadyelsahar/large-arabicsentiment-analysis-resouces-last
[19] Petra Kralj Novak, Jasmina Smailovic, Borut Sluban, ´ and Igor Mozetic. 2015. Sentiment of emojis. ˇ PloS one, 10(12):e0144296.
[20] A. Rabab’ah, M. Al-Ayyoub, Y. Jararweh, M. Al-Kabi, Evaluating sentistrength for Arabic sentiment analysis, in: 2016 7th International Conference on Computer Science and Information Technology (CSIT), 2016, pp. 1–6.
[21] Baly, Ramy, et al. "A Characterization Study of Arabic Twitter Data with a Benchmarking for State-of-the-Art Opinion Mining Models." WANLP 2017 (co-located with EACL 2017) (2017): 110.
[22] Dahou, Abdelghani, et al. "Word Embeddings and Convolutional Neural Network for Arabic Sentiment Classification."
[23] Altowayan, A. Aziz, and Lixin Tao. "Word embeddings for Arabic sentiment analysis." Big Data (Big Data), 2016 IEEE International Conference on. IEEE, 2016.
[24] N. A. Abdulla, N. A. Ahmed, M. A. Shehab, and M. AlAyyoub, “Arabic sentiment analysis: Lexicon-based and corpus-based,” in Applied Electrical Engineering and Computing Technologies (AEECT), 2013 IEEE Jordan Conference on, Dec 2013, pp. 1–6.
[25] A. Mourad and K. Darwish, “Subjectivity and sentiment analysis of modern standard arabic and arabic microblogs,” in Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, 2013, pp. 55–64.