Arabic Word Semantic Similarity

This paper is concerned with the production of an Arabic word semantic similarity benchmark dataset. It is the first of its kind for Arabic which was particularly developed to assess the accuracy of word semantic similarity measurements. Semantic similarity is an essential component to numerous applications in fields such as natural language processing, artificial intelligence, linguistics, and psychology. Most of the reported work has been done for English. To the best of our knowledge, there is no word similarity measure developed specifically for Arabic. In this paper, an Arabic benchmark dataset of 70 word pairs is presented. New methods and best possible available techniques have been used in this study to produce the Arabic dataset. This includes selecting and creating materials, collecting human ratings from a representative sample of participants, and calculating the overall ratings. This dataset will make a substantial contribution to future work in the field of Arabic WSS and hopefully it will be considered as a reference basis from which to evaluate and compare different methodologies in the field.





References:
[1] S. Ravi, and M. Rada, "Unsupervised graph-based word sense
disambiguation using measures of word semantic similarity," In
Proceedings of ICSC, 2007.
[2] A. Hliaoutakis, G. Varelas, E. Voutsakis, E. G. M. Petrakis, and E. E.
Milios, "Information retrieval by semantic similarity," International
Journal on Semantic Web and Information Systems, vol. 2, no. 3, pp. 55-
73, 2006.
[3] J. Davies, U. Krohn, and R. Weeks, "QuizRDF: Search technology for
the semantic web," WWW2002 workshop on RDF and Semantic Web
Applications, 11th International WWW Conference WWW2002, Hawaii,
USA, 2002.
[4] Y. Aytar, M. Shah, and L. Jiebo, "Utilizing semantic word similarity
measures for video retrieval," IEEE Int. Conf. on Computer Vision and
Pattern Recognition (CVPR08), pp. 1-8, Jun. 2008.
[5] F. M. Couto, M. J. Silva, and P. M. Coutinho, "Measuring semantic
similarity between Gene ontology terms," Data & Knowledge
Engineering, vol. 61, no. 1, pp. 137-152, 2007.
[6] H. Chukfong, A. Masrah, M. Azmi, A. Rabiah and C. Shyamala, "Word
sense disambiguation based sentence similarity," Coling 2010: Poster
Volume, pp. 418-426, Beijing, Aug. 2010.
[7] E.K. Park, D.Y. Ra, and M.G. Jang, "Techniques for improving web
retrieval effectiveness," Information Processing and Management, vol.
41, no. 5, pp. 1207-1223, 2005.
[8] J. Atkinson-Abutridy, C. Mellish, and S. Aitken, "Combining
information extraction with genetic algorithms for text mining," IEEE
Intelligent Systems, vol. 19, no. 3, 2004.
[9] K. O-Shea, Z. Bandar, and K. Crockett, "A Conversational agent
framework using semantic analysis," International Journal of Intelligent
Computing Research (IJICR), vol. 1, no. 1, Mar. 2010.
[10] V. S. Zuber, and B. Faltings, "OSS: A semantic similarity function
based on hierarchical ontologies," In Proceedings of IJCAI, pp. 551-556,
2007.
[11] P. Resnik, "Information content to evaluate semantic similarity in a
taxonomy," In Proceedings of IJCAI, pp. 448-453, 1995.
[12] M. Diab, M. Alkhalifa, S. ElKateb, C. Fellbaum, A. Mansouri, and M.
Palmer, "Semeval-2007 task 18: Arabic semantic labelling," In
Proceedings of the Fourth International Workshop on Semantic
Evaluations (SemEval-2007), Prague, Czech Republic, 2007.
[13] M. Hijjawi, ArabChat : an Arabic Conversational Agent. PhD. Thesis,
Department of Computing and Mathematics, Faculty of Science and
Engineering, Manchester Metropolitan University, UK, 2011.
[14] A. Farghaly, K. Shaalan, "Arabic natural language processing:
challenges and solutions," ACMTransactions on Asian Language
Information Processing, vol. 8, no. 4, Article 14, 2009.
[15] N. Y. Habash, Introduction to Arabic Natural Language Processing.
Graeme Hirst 2010. Morgan &Claypool, 2010, PP 11-12 & 39-41.
[16] M. Jarmasz, and S. Szpakowicz, "Roget-s Thesaurus and semantic
similarity," In proceedings of the international conference on Recent
Advances in Natural Language processing, Borovetz, Bulgaria, pp. 212-
219, 2003.
[17] R. Rada, H. Mili, M. Bicknell, and E. Blettner, "Development and
application of a metric on semantic nets," IEEE Trans. on Systems, Man,
and Cybernetics, vol. 19, pp 17-30, 1989.
[18] D. Lin, "An Information-theoretic definition of similarity," In
Proceedings of Conference on Machine Learning, pp. 296-304, 1998.
[19] Y. Li, Z. Bandar, and D. McLean, "An approach for measuring semantic
similarity between words using multiple information sources," IEEE
Trans. on Knowledge and Data Engineering, vol. 15, no. 4, pp. 871-882,
2003.
[20] T. Pedersen, V. S. Pakhomov, S. Patwardhan, and C.G. Chute,
"Measures of semantic similarity and relatedness in the Biomedical
Domain," Journal of Biomedical Informatics, vol. 40, PP. 288-299,
2007.
[21] G. Pirro, "Semantic similarity metric combining features and intrinsic
information content," Data & Knowledge Engineering, vol. 68. pp.
1289-1308, 2009.
[22] H. Rubenstein, and J. Goodenough, "Contextual correlates of
synonymy," Communications of the ACM, Vol. 8, pp.627-633, 1965.
[23] G.A. Miller, and W.G. Charles, "Contextual correlates of semantic
similarity," Language and Cognitive Processes, vol. 6, pp.1-28, 1991.
[24] J.D. O-Shea, Z. Bandar, K. Crockett, and D. McLean, "Benchmarking
Short Text Semantic Similarity," Int. J. Intelligent Information and
Database Systems, vol. 4, no. 2, pp. 103-120, 2010.
[25] W.F. Battig, and W.E. Montague, "Category norms for verbal items in
56 categories: A replication and extension of the Connecticut category
norms," Journal of Experimental Psychology Monographs, vol. 80, PP.
1-46, 1969.
[26] J.P. Van Overschelde, K.A. Rawson, and J. Dunlosky (2004), "Category
norms: An updated and expanded version of the Battig and Montague
(1969) norms," Journal of Memory and Language, vol. 50, pp. 289-335,
2004.
[27] B. Munir, AL-MAWRID: A Modern English-Arabic Dictionary. Dar ELILMILMALAYIN,
Beirut, Lebanon. Edition 11, 1977.
www.malayin.com.
[28] J. Sinclair, Collins Cobuild English Dictionary for Advanced Learners,
3rd edn. Harper Collins, New York, 2001.
[29] S. Elkateb, W. Black, H. Rodriguez, M. Alkhalifa, P. Vossen, A. Pease,
and C. Fellbaum, "Building a WordNet for Arabic," In Proceedings of
the Fifth International Conference on Language Resources and
Evaluation, Genoa, Italy, 2006.