The Efficiency of Association Measures in Automatic Extraction of Collocations: Exclusivity and Frequency

This paper deals with automatic extraction of 20 ‘adjective + noun’ collocations using four different association measures: T-score, MI, Log Dice, and Log Likelihood with most emphasis on mainly Log Likelihood and Log Dice scores for which an argument for their suitability in this experiment is to be presented. The nodes of the chosen collocates are 20 adjectival false friends between English and French. The noun candidate to be chosen needs to occur with a threshold of top ten collocates in two lists in which the results are sorted by Log Likelihood and Log Dice. The fulfillment of this criterion will guarantee that the chosen candidates are both exclusive and significant noun collocates and thereby, they make perfect noun candidates for the nodes. The results of the top 10 collocates sorted by Log Dice and Log Likelihood are not to be filtered. Thereby technical terms, function words, and stop words are not to be removed for the purposes of the analysis. Out of 20 adjectives, 15 ‘adjective + noun’ collocations have been extracted by the means of consensus of Log Likelihood and Log Dice scores on the top 10 noun collocates. The generated list of the automatic extracted ‘adjective + noun’ collocations will serve as the bulk of a translation test in which Algerian students of translation are asked to render these collocations into Arabic. The ultimate goal of this test is to test French influence as a Second Language on English as a Foreign Language in the Algerian context.