Effective Features for Disambiguation of Turkish Verbs

This paper summarizes the results of some experiments for finding the effective features for disambiguation of Turkish verbs. Word sense disambiguation is a current area of investigation in which verbs have the dominant role. Generally verbs have more senses than the other types of words in the average and detecting these features for verbs may lead to some improvements for other word types. In this paper we have considered only the syntactical features that can be obtained from the corpus and tested by using some famous machine learning algorithms.





References:
[1] Saussure, Ferdinand de. 1974 (1916). Course in General Linguistics. Tr.
Wade Baskin. Glasgow: Fontana & Collins. (Orig.: Cours de
linguistique générale.Lousanne et Paris: Payot.)
[2] Canfield J.V. (Editor), 1997, Philosophy of Meaning, Knowledge and
Value in the 20th Century: Routledge History of Philosophy Volume 10.
British Library Cataloguing in Publication data.
[3] Ng, H.T., and Lee, H.B., 1996. Integrating multiple knowledge sources
to disambiguate word sense: An exemplar-based approach. In
Proceedings of the 34th Annual Meeting of the Association for
Computational Linguistics (ACL-96), Santa Cruz.
[4] SENSEVAL: An Exercise in Evaluating Word Sense Disambiguation
Programs A. Kilgarriff. In Proc. LREC, Granada, May 1998. Pp 581--
588.
[5] Schutze, H., and Pedersen, J. 1995. Information Retrieval Based on
Word Senses. In Proceedings of the Fourth Annual Symposium on
Document Analysis and Information Retrieval, 161-175. Las Vegas,
Nev.: University of Nevada at Las Vegas.
[6] Ido Dagan , Alon Itai, Word sense disambiguation using a second
language monolingual corpus, Computational Linguistics, v.20 n.4,
p.563-596, December 1994.
[7] R. Bruce and J. Wiebe. 1999. Decomposable modeling in natural
language processing. Computational Linguistics, 25(2):195-207.
[8] Pedersen, T., 2001. A decision tree of bigrams is an accurate predictor of
word sense. In Proceedings of the North American Chapter of the
Association for Computational Linguistics, NAACL 2001, pages 79-86,
Pittsburg.
[9] Fellbaum, C., Palmer, M., Dang, H.T., Delfs, L., and Wolf., S., 2001.
Manual and automatic semantic annotation with WordNet. In WordNet
and Other lexical resources: NAACL 2001 workshop, pages 3-10,
Pittsburgh.
[10] Ng, H. T., Zelle, J., Winter, 1997, Corpus-based approaches to semantic
interpretation in natural language processing - Natural Language
Processing, AI Magazine.
[11] Kelly, E. and Stone, P. (1975) Computer Recognition of English Word
Senses, North Holland, Amsterdam.
[12] Yarowsky, D. 1993. One Sense per Collocation. In Proceedings of the
ARPA Human-Language Technology Workshop, 266-271. Washington,
D.C.: Advanced Research Projects Agency.
[13] Yarowsky, D. 1994. Decision Lists for Lexical Ambiguity Resolution:
Application to Accent Restoration in Spanish and French. In
Proceedings of the Thirty-Second Annual Meeting of the Association for
Computational Linguistics, 88-95. Somerset, N.J.: Association for
Computational Linguistics.
[14] Bruce, R. and J. Wiebe. 1994. Word-sense disambiguation using
decomposable models. In Proceedings of the 32nd Annual Meeting of
the Association for Computational Linguistics, pages 139-- 146.
[15] Rada Mihalcea, August 2002, Instance Based Learning with Automatic
Feature Selection Applied to Word Sense Disambiguation, in
Proceedings of the 19th International Conference on Computational
Linguistics (COLING 2002), Taiwan.
[16] Pedersen, Ted and Rebecca Bruce. 1997. A new supervised learning
algorithm for word sense disambiguation. In Proceedings of the 14th
National Conference on Artificial Intelligence (AAAI-97), Providence,
RI.
[17] R. Mooney. 1996. Comparative experiments on disambiguating word
senses: An illustration of the role of bias in machine learning. In
Proceedings of the 1996 Conference on Empirical Methods in Natural
Language Processing (EMNLP-1996), pages 82-91, Philadelphia.
[18] Leacock, C., Towell, G. and Voorhees, E. M., 1993 "Corpus-based
statistical sense resolution." In Proceedings of the ARPA Human
Languages Technology Workshop.
[19] Gale, W., K. Church, and D. Yarowsky. ``Work on Statistical Methods
for Word Sense Disambiguation.'' In Proceedings, AAAI Fall
Symposium on Probabilistic Approaches to Natural Language.
Cambridge, MA, pp. 54-60, 1992.
[20] Yarowsky, D. `` Word-Sense Disambiguation Using Statistical Models
of Roget's Categories Trained on Large Corpora.'' In Proceedings,
COLING-92. Nantes, pp. 454-460, 1992.
[21] R. Mooney. 1996. Comparative experiments on disambiguating word
senses: An illustration of the role of bias in machine learning. In
Proceedings of the 1996 Conference on Empirical Methods in Natural
Language Processing (EMNLP-1996), pages 82-91, Philadelphia.
[22] Pedersen, T., 2001. A decision tree of bigrams is an accurate predictor of
word sense. In Proceedings of the North American Chapter of the
Association for Computational Linguistics, NAACL 2001, pages 79-86,
Pittsburg.
[23] Yarowsky, D. `` Hierarchical Decision Lists for Word Sense
Disambiguation.'' Computers and the Humanities, 34(2):179-186, 2000.
[24] H. T. Ng. 1997. Exemplar-Base Word Sense Disambiguation: Some
Recent Improvements. In Procs. of the 2nd Conference on Empirical
Methods in Natural Language Processing, EMNLP.
[25] C. Cardie. 1993. A case-based approach to knowledge acquisition for
domain-specific sentence analysis. In Proceedings of the Eleventh
National Conference on Artificial Intelligence, pages 798-803,
Washington, DC.
[26] Veenstra, A. van den Bosch, J., S. Buchholz, W. Daelemans, and J.
Zavrel. 2000 Memory-based word sense disambiguation Computers and
the Humanities, 34:171-177.
[27] Daeleman, W.,Machine Learning of Language: A Model and a Problem,
ESSLLI'2002 Workshop on Machine Learning Approaches in
Computational Linguistics, August 5 - 9, 2002, Trento, Italy.
[28] G. Escudero, L. Mrquez, and G. Rigau. 2000, Naive Bayes and
Exemplar-Based Approaches to Word Sense Disambiguation Revisited.
In Proceedings of the 14th European Conference on Artificial
Intelligence, ECAL.
[29] Lee, Yoong Keok, & Ng, Hwee Tou. An Empirical Evaluation of
Knowledge Sources and Learning Algorithms for Word Sense
Disambiguation. Proceedings of the 2002 Conference on Empirical
Methods in Natural Language Processing (EMNLP-2002). pp. 41-48,
2002.
[30] D.W. Aha and R.L. Bankert. 1994. Feature selection for case-based
classification of cloud types: An empirical comparison. In Proceedings
of the AAAI-94 Workshop on Case-Based Reasoning, pages 106-112,
Seattle, WA.
[31] A.W. Moore and M.S. Lee. 1994. Efficient algorithms for minimizing
cross validation error. In International Conference on Machine Learning,
pages 190-198, New Brunswick.
[32] C. Cardie. 1996. Automating feature set selection for case-based
learning of linguistic knowledge. In Proceedings of the Conference on
Empirical Methods in Natural Language Processing EMNLP, pages 113-
126, Somerset, New Jersey.
[33] P. Domingos. 1997. Context-sensitive feature selection for lazy learners.
Artificial Intelligence Review, (11):227-253.
[34] Mihalcea, R., Instance Based Learning with Automatic Feature Selection
Applied to Word Sense Disambiguation, in Proceedings of the 19th
International Conference on Computational Linguistics (COLING
2002), Taiwan, August 2002.
[35] Y─▒lmaz, O., September, 1994, Design and implementation of a verb
lexicon and sense disambiguator for Turkish, MS. Thesis, Bilkent
University, Ankara, Turkey.
[36] Orhan Z., Altan Z., 2003, "Disambiguation of Turkish Word Senses
By Supervised Statistical Methods", International XII. Turkish
Symposium on Artificial Intelligence and Neural Networks (TAINN
2003).
[37] Nart B. Atalay, Kemal Oflazer, Bilge Say, The Annotation Process in
the Turkish treebank, in Proceedings of the EACL Workshop on
Linguistically Interpreted Corpora-LINC, April 13-14, 2003, Budapest.
[38] Fellbaum C., 1998, WordNet: An Electronic Lexical Database. The MIT
press.
[39] Ciaramita M., Johnson M. , 2004, "Multi-Component Word Sense
Disambiguation" Proceedings of Senseval-3: The Third International
Workshop on the Evaluation of Systems for the Semantic Analysis of
Text, pp. 97-100.
[40] Stamou, S., Oflazer, K., Pala, K., Christodoulakis, D., Cristea, D., Tufis,
D., Koeva, S., Totkov, G., Dutoit, D., Grigoriadou, M., BalkaNet: A
multilingual Semantic Network for Balkan Languages, in Proceedings of
the First International WordNet Conference, Mysore India, January
2002.
[41] O. Bilgin, Çetınoğlu, Ö., Oflazer, K., Building a Wordnet for Turkish,
Romanian Journal of Information Science and Technology, Volume 7,
Numbers 1-2, 2004, 163-172.
[42] WEKA system, http://www.cs.waikato.ac.nz/ml/weka.