The Efficiency of Association Measures in Automatic Extraction of Collocations: Exclusivity and Frequency

This paper deals with automatic extraction of 20 ‘adjective + noun’ collocations using four different association measures: T-score, MI, Log Dice, and Log Likelihood with most emphasis on mainly Log Likelihood and Log Dice scores for which an argument for their suitability in this experiment is to be presented. The nodes of the chosen collocates are 20 adjectival false friends between English and French. The noun candidate to be chosen needs to occur with a threshold of top ten collocates in two lists in which the results are sorted by Log Likelihood and Log Dice. The fulfillment of this criterion will guarantee that the chosen candidates are both exclusive and significant noun collocates and thereby, they make perfect noun candidates for the nodes. The results of the top 10 collocates sorted by Log Dice and Log Likelihood are not to be filtered. Thereby technical terms, function words, and stop words are not to be removed for the purposes of the analysis. Out of 20 adjectives, 15 ‘adjective + noun’ collocations have been extracted by the means of consensus of Log Likelihood and Log Dice scores on the top 10 noun collocates. The generated list of the automatic extracted ‘adjective + noun’ collocations will serve as the bulk of a translation test in which Algerian students of translation are asked to render these collocations into Arabic. The ultimate goal of this test is to test French influence as a Second Language on English as a Foreign Language in the Algerian context.





References:
[1] Ahmed, Z. A. A. 2012. English lexical collocation knowledge of Libyan university students. thesis, Prifysgol Bangor University.
[2] Al-Kattan, A. B. 2007. The Notion of Collocation in English with Reference to Arabic. Buhuth Mustaqbaliya Scientific Periodical Journal, 4(1), pp.7-17/18.
[3] Alsakran, R. A. 2011. The productive and receptive knowledge of collocations by advanced Arabic-speaking ESL/EFL learners. Unpublished Thesis, Colorado State University, Colorado.
[4] Brezina, V., McEnery, T. and Wattam, S. 2015. Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), pp.139-173.
[5] Evert, S. 2008. Corpora and collocations. Corpus linguistics. An international handbook, 2, pp.1212-1248.
[6] Evert, S. and Kermes, H. 2003. Experiments on candidate data for collocation extraction. In: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 2: Association for Computational Linguistics, pp.83-86.
[7] Gablasova, D., Brezina, V. and Mcenery, T. 2017. Collocations in Corpus‐Based Language Learning Research: Identifying, Comparing, and Interpreting the Evidence. Language Learning, 67(S1), pp.155-179.
[8] Gries, S. T. 2013. 50-something years of work on collocations. International Journal of Corpus Linguistics, 18(1), pp.137-166.
[9] Manning, C. D. and Schütze, H. 1999. Foundations of statistical natural language processing. MIT press.
[10] McEnery, T. and Hardie, A. 2012. Corpus linguistics: method, theory and practice. Cambridge: Cambridge University Press.
[11] Nesselhauf, N. 2005. Collocations in a learner corpus. John Benjamins Publishing.
[12] Pastor, G. C. 2017. Collocational Constructions in Translated Spanish: What Corpora Reveal. In: R. MITKOV, ed. Computational and corpus-based phraseology. London: Sprigner, pp.29-40.
[13] Pecina, P. 2010. Lexical association measures and collocation extraction. Language Resources and Evaluation, 44(1), pp.137-158.
[14] Sinclair, J. M. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.
[15] Thody, P., Evans, H. and Rees, G. 1985. Faux Amis and Key Words: A Dictionary-guide to French Life and Language Through Lookalikes and Confusables. Bloomsbury Publishing.
[16] Petrović, S., Šnajder, J. and Bašić, B. D. 2010. Extending lexical association measures for collocation extraction. Computer Speech & Language, 24(2), pp.383-394.