Tagging by Combining Rules- Based Method and Memory-Based Learning

Many natural language expressions are ambiguous, and need to draw on other sources of information to be interpreted. Interpretation of the e word تعاون to be considered as a noun or a verb depends on the presence of contextual cues. To interpret words we need to be able to discriminate between different usages. This paper proposes a hybrid of based- rules and a machine learning method for tagging Arabic words. The particularity of Arabic word that may be composed of stem, plus affixes and clitics, a small number of rules dominate the performance (affixes include inflexional markers for tense, gender and number/ clitics include some prepositions, conjunctions and others). Tagging is closely related to the notion of word class used in syntax. This method is based firstly on rules (that considered the post-position, ending of a word, and patterns), and then the anomaly are corrected by adopting a memory-based learning method (MBL). The memory_based learning is an efficient method to integrate various sources of information, and handling exceptional data in natural language processing tasks. Secondly checking the exceptional cases of rules and more information is made available to the learner for treating those exceptional cases. To evaluate the proposed method a number of experiments has been run, and in order, to improve the importance of the various information in learning.




References:
[1] A.Goweder, M.Poesio, A.De Roeck, J.Reynolds,ÔÇÿIdentifying Broken
Plurals in Unvowelised Arabic Text-, ACL 2001. Arabic Language
Processing.
[2] A. Farghali, ÔÇÿComputer Processing of Arabic Script-based Languages:
Curent State and Future Directions-, Coling 2004, Work Shop on
Computational Approaches to Arabic Script-based Language, Geneva,
Switzerland, August 28, 2004.
[3] A.Roberts, ÔÇÿMachine Learning in Natural Language Processing-,
www.comp.Leeds.ac.uk , October 16, 2003.
[4] J. Zavrel & Walter.Daelemans, ÔÇÿRecent Advances in Memory-Based
Part-of-Speech Tagging-, Induction of Linguistic Knowledge TSL 2000.
[5] M. Van Mol, ÔÇÿThe semi-automatic tagging of Arabic corpora-, The
Dutch language Union, Amsterdam, Bulaaq, 2001.
[6] M. Maamouri & Ann Bies, ÔÇÿ Developing an Arabic Treebank: Method,
Guidelines, Procedures, and Tools-, Coling 2004, Workshop on
Computational Approaches to Arabic Script-based Language, Geneva,
Switzerland, August 28, 2004.
[7] M. Diab & Kadri.Hacioglu & Daniel Jurafsky, Automatic Tagging of
Arabic Text: From Raw Text to Base Phrase Chunks, The National
Science Foundation, USA, 2004.
[8] S. Abuleil & K. Alsamara & Martha.Evens, ÔÇÿAcquisition System for
Arabic Noun Morphology-, Computer and Humanities 36(2):191-221,
May 2002.
[9] Saleem.Abuleil & Martha.Evens, ÔÇÿDiscovering Lexical Information by
Tagging Arabic Newspaper Text-, Workshop on Semitic Language
Processing. COLING-ACL-98.
[10] T. Buckwalter. 2002. Buckwalter Arabic Morphological Analyzer
Version 1.0. Linguistic Data Consortium, Catalog number LDC
2002L49 and ISBN 1-58563-257-0, http://www.ldc.upenn.edu .
[11] Seong-Bac.Park & Byoung-Tak.Zhang, ÔÇÿText Chunking by Combining
Hand-Crafted Rules and Memory-Based Learning-, Proceedings of the
41st Annual Meeting of the Association for Computational Linguistics,
July 2003, pp 497-504.
[12] S. Khoja, R.Garside, G.Knowles, ÔÇÿA tagset for the morph syntactic
tagging of Arabic-,
http://www.comp.lancs.au.uk/computing/users/khoja/cl2001.pdf .
[13] T. Buckwalter, ÔÇÿIssues in Arabic Orthography and Morphology
Analysis-, Coling 2004, Workshop on Computational Approaches to
Arabic Script-based Language, Geneva, Switzerland, August 28, 2004.
[14] Valli.André & Jean.Veronis, ÔÇÿEtiquetage grammatical des corpus de
parole : problèmes et perspectives-,
http://www.up.univ-mrs.fr/~veronis/pdf/1999rfla.pdf
[15] W. Daelemans & Antal van den.Bosch & Jakub.Zavrel & Jorn.Veenstra
& Sabine.Buchholz & Bertjan.Busser , ÔÇÿRapid Development of NLP
Modules with Memory-based Learnig-, Proceeding of ELSNET in
Wonderland, March 1998, pp105-113.
[16] W. Daelemans & Jakub.Zavrel, ÔÇÿPart-of-Speech Tagging of Dutch with
MBT- Informatiewetenschap 1996, pp 33-40, The Netherlands.TU Delft.
[17] Young-suk.Lee, Kishore Papineni, Salim.Roukos, ÔÇÿLangage Model
Based Arabic Word Segmentation-, www.acl.ldc.upenn.edu .