Knowledge Required for Avoiding Lexical Errors at Machine Translation

This research aims at finding out the causes that led to wrong lexical selections in machine translation (MT) rather than categorizing lexical errors, which has been a main practice in error analysis. By manually examining and analyzing lexical errors outputted by a MT system, it suggests what knowledge would help the system reduce lexical errors.




References:
[1] Anick, P, Verhagen, M., and Pustejovsky, J. 2014. Identification of Technology Terms in Patents. LREC 2014. 2008-2014.
[2] Baldwin, T., Bannard, C., Tanaka, T. and Widdows, D. 2003. An Empirical Model of Multiword Expression Decomposaibility. In Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions Analysis, Acquisition and Treatment. 89-96.
[3] Church, K. 2013. How Many Multiword Expressions Do People Know? ACM Transactions on Speech and Language Processing. 10(2), Article 4: 1-13.
[4] Elliot, D, Hartley, A., and Atwell, E. 2004. A Fluency Error Categorization Scheme to Guide Automated Machine Translation Evaluation. AMTA 2004. Pages 64-73.
[5] Farrús, M., Costa-jussa, M., Marino, J., and Jose Fonollosa, J. 2010. Linguistic-based Evaluation Criteria to Identify Statistical Machine Translation Errors. EAMT 2010. Pages 167-173.
[6] Farrús, M., Costa-jussa, M., Marino, J., Posh, M., Hernandez, A., Henriquez, C., Jose A., and Fonollosa, J. 2011. Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan-Spanish language pair. Language Resources and Evaluation (Springer). Vol. 45 Issue 2. 181-208.
[7] Flanagan, M. 1994. Error classification for MT evaluation. AMTA 1994. 65-72.
[8] Hunston, S. and Francis, G. 2000. Pattern Grammar A corpus-driven approach to the lexical grammar of English. Benjamins Publishing Co.
[9] Hurskainen, A. 2008 Multiword Expressions and Machine Translation. Technical Reports in Language Technology Report No 1, 2008. http://www.njas.helsinki.fi/salam.
[10] Kim, S. and Baldwin, T. 2013. Word Sense and Semantic Relations in Noun Compounds. ACM Transactions on Speech and Language Processing. 10(3), Article 9: 1-17.
[11] Kordoni, V. and Simova, I. 2014. Multiword Expressions in Machine Translation. LREC 2014. 1208-1211.
[12] Lau, J., Baldwin, T., and Hewman, D. 2013. On Collocations and Topic Models. ACM Transactions on Speech and Language Processing. 10(3), Article 10: 1-14.
[13] Nadeau, D. and Sekine, S. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes. 30(1):3-26.
[14] Popović, M. and Burchardt, A. 2011. From Human to Automatic Error Classification for Machine Translation Output. Proceedings of the 15th Conference of the European Association for Machine Translation. 265-272,
[15] Ramisch, C., Villavicencio, A., and Kordoni, V. 2013. Introduction to the Special Issue on Multiword Expressions: From Theory to Practice and Use. ACM Transactions on Speech and Language Processing. 10(2), Article 3: 1-10.
[16] Sag, I., Baldwin, T., Bond, F., Copestake, A, and Flickinger, D. 2002. Multiword Expressions: A Pain in the Neck for NLP, In Proc. of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), pages 1–15, Mexico City, Mexico.
[17] Stymne, S. and Ahrenberg, L. 2012. On the practice of error analysis for machine translation evaluation. LREC. 1785-1790.
[18] Shutova, E., Kaplan, J., Teufel, S., and Korhonen, A. 2013. A Computational Model of Logical Metonymy. ACM Transactions on Speech and Language Processing. 10(3), Article 11:1-28.
[19] Vilar, D., Xu, J., D’Haro, L., and Ney, H. 2006. Error Analysis of Statistical Machine Translation Output. Proceedings of the LREC. 697-702.
[20] Wong W, Liu W, and Bennamoun M. 2007. Determining Termhood for Learning Domain Ontologies in a Probabilistic Framework. In 6th Australasian Conference on Data Mining (Isbn 978-1-920682-51-4).