Saudi Twitter Corpus for Sentiment Analysis

Sentiment analysis (SA) has received growing
attention in Arabic language research. However, few studies have yet
to directly apply SA to Arabic due to lack of a publicly available
dataset for this language. This paper partially bridges this gap due to
its focus on one of the Arabic dialects which is the Saudi dialect. This
paper presents annotated data set of 4700 for Saudi dialect sentiment
analysis with (K= 0.807). Our next work is to extend this corpus and
creation a large-scale lexicon for Saudi dialect from the corpus.




References:
[1] M.T. Diab, L. Levin, T. Mitamura, O. Rambow, V. Prabhakaran, and W.
Guo. 2009. Committed belief annotation and tagging. In Proceedings of
the Third Linguistic Annotation Workshop, pages 68–73. Association
for Computational Linguistics.
[2] M. N. Al-Kabi, A. H. Gigieh, I. M. Alsmadi, H. A. Wahsheh, and M. M.
Haidar,2014. Opinion Mining and Analysis for Arabic Language. Int. J.
Adv. Comput. Sci. Appl., 5(5).
[3] Nawaf A. Abdulla, Nizar A. Ahmed, Mohammed A. Shehab and
Mahmoud Al-Ayyoub, 2013. Arabic Sentiment Analysis: Lexicon-based
and Corpus-based. IEEE Conference on Applied Electrical Engineering
and Computing Technologies, Jordan, pp.1-6.
[4] M. Abdul-Mageed and M. T. Diab. Subjectivity and sentiment
annotation of modern standard arabic newswire. In Proceedings of the
5th Linguistic Annotation Workshop, LAW V ’11, pages 110–118,
2011.
[5] Al-kabi MN, Abdulla NA and Al-ayyoub M. An analytical study of
arabic sentiments: maktoob case study. In: 8th international conference
for internet technology and secured transactions, IEEE, London, UK, pp.
89-94, 2013.
[6] Farra N, Challita E, Abou-assi R and Hajj H. Sentence-level and
document-level sentiment mining for arabic texts. In: International
conference on data mining workshops, IEEE, pp. 1114-1119, 2010.
[7] Korayem M, Crandall D and Abdul-mageed M. Subjectivity and
sentiment analysis of arabic: a survey. In Advanced Machine Learning
Technologies and Applications, 128-139, 2012.
[8] Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi,
Wejdan M. Alohaideb. Survey on Arabic Sentiment Analysis in Twitter.
International Science Index, 9 (1), pp. 364-368, 2015.
[9] Mubarak, H., & Darwish, K. (2014). Using twitter to collect a multidialectal
corpus of arabic. ANLP 2014, 1. [10] Diab, M., Habash, N., Rambow, O., Altantawy, M., & Benajiba, Y.
(2010). Colaba: Arabic dialect annotation and processing. In Lrec
workshop on semitic language processing (pp. 66–74).
[11] Habash, N., & Rambow, O. (2006). Magead: a morphological analyzer
and generator for the arabic dialects. In Proceedings of the 21st
international conference on computational linguistics and the 44th
annual meeting of the association for computational linguistics.
[12] Buckwalter, T. (2004). Buckwalter arabic morphological analyzer
version 2.0. ldc catalog number ldc2004l02 (Tech. Rep.). ISBN 1-
58563-3-0.
[13] Faisal Al-Shargi, Owen Rambow. DIWAN: A Dialectal Word
Annotation Tool for Arabic. Proceedings of the Second Workshop on
Arabic Natural Language Processing, pages 49–58, Beijing, China, July
26-31, 2015. c 2014 Association for Computational Linguistics.
[14] Eshrag Refaee and Verena Rieser. 2014. An Arabic twitter corpus for
subjectivity and sentiment analysis. In Proceedings of the Ninth
International Conference on Language Resources and Evaluation (LREC
14), Reykjavik, Iceland, may. European Language Resources
Association (ELRA).
[15] A. Abbasi, H. Chen, and A. Salem. 2008. Sentiment analysis in multiple
languages: Feature selection for opinion classification in web forums.
ACM Trans. Inf. Syst., 26:1–34.
[16] M. Abdul-Mageed and M. Diab. AWATIF: A Multi-Genre Corpus for
Modern Standard Arabic Subjectivity and Sentiment Analysis. In
Proceedings of the Eight International Conference on Language
Resources and Evaluation (LREC’12). European Language Resources
Association (ELRA), 2012.
[17] Mahmoud Nabil, Mohamed Aly, Amir F. Atiya. ASTD: Arabic
Sentiment Tweets Dataset. Proceedings of the 2015 Conference on
Empirical Methods in Natural Language Processing, pages 2515–2519,
Lisbon, Portugal, 17-21 September 2015. c 2015 Association for
Computational Linguistics.
[18] A. Balahur and R. Steinberger. 2009. Rethinking Sentiment Analysis in
the News: from Theory to Practice and back. Proceeding of WOMSA.
[19] Cohen, J. (1960). A coefficient of agreement for nominal scales.
Educational and psychological measurement, 20(1):37–46.
[20] Carletta, J. (1996). Assessing agreement on classification tasks: the
kappa statistic. Computational Linguistics, 22(2):249–254.
[21] F. Palmer. 1986. Mood and Modality. 1986. Cambridge: Cambridge
University Press.
[22] L. Polanyi and A. Zaenen. 2006. Contextual valence shifters. Computing
attitude and affect in text: Theory and applications, pages 1–10.