Enhancing Retrieval Effectiveness of Malay Documents by Exploiting Implicit Semantic Relationship between Words

Phrases has a long history in information retrieval, particularly in commercial systems. Implicit semantic relationship between words in a form of BaseNP have shown significant improvement in term of precision in many IR studies. Our research focuses on linguistic phrases which is language dependent. Our results show that using BaseNP can improve performance although above 62% of words formation in Malay Language based on derivational affixes and suffixes.





References:
[1] Atlam, E.S., Fuketa, M., Morita, K., & Aoe, J., Documents Similarity
Measurement Using Field Association Terms, Information Processing
and Management Journal, 39, 2003, pp. 809-824.
[2] Baeza-Yates, R & Ribeiro-Neto, B., Modern Information Retrieval,
Addison-Wesley, New York, 1999.
[3] Croft, W. B., User-specified Domain Knowledge for Document
Retrieval, Proceedings Of The ACM Conference On Research And
Development In Information Retrieval, 1986, pp. 201-206.
[4] Fatimah A., A Malay Language Document Retrieval System: An
Experimental Approach And Analysis, Ph.D Thesis, Universiti
Kebangsaan Malaysia, 1995
[5] Fagan, J. L, Experiments in Automatic Phrase Indexing for Document
Retrieval: A Comparison of Syntactic and Non-Syntactic Methods, Ph.D.
Thesis, Department of Computing Science, Cornell University, Ithica,
New York, 1987.
[6] Lewis, D.D. and Jones, K.S., Natural Language Processing for
Information Retrieval, Communication of the ACM, Vol 39 No. 1 , 1996,
pp. 92-100.
[7] Sanderson, M. ,Word Sense Disambiguation and Information Retrieval,
Proceedings of the Seventeenth Annual International ACM-SIGIR
Conference on Research and Development in Information Retrieval,
1994, pp. 142-151, Springer-Verlag.
[8] Salton, G., A Blueprint For Automatic Indexing, ACM SIGIR Forum 16,
2 (Fall 1981), 1981, pp. 22-38.
[9] Salton, C.. and Lesk., M.E. Computer Evaluation Of Indexing And Text
Processing, Communication of the ACM, Vol 15 No. 1 , 1986, pp. 6-36.
[10] Salton, G., Introduction to Modern Information Retrieval, McGraw-Hill,
New York, 1983.
[11] Salton, G., Another Look At Automatic Text Retrieval Systems,
Communications of the ACM, Vol 29 No. 7, 1986, pp. 648-656.
[12] Van Rijsbergen, C.J. Information Retrieval, 2nd edition,
Butterworth.,1979.
[13] Yun, B. H., H. S. Lim and H.C. Rim, Analysis of Korean Compound
Nouns using Statistical Information, Proc. of the 22nd Korea
Information Science Society Spring Conference, 1994, pp 925-928.
[14] Zainab Abu Bakar, Evaluation Of Retrieval Effectiveness Of Conflation
Methods On Malay Documents, Ph.D Thesis, Universiti Kebangsaan
Malaysia, 1999.
[15] Zainab Abu Bakar & Nurazzah Abdul Rahman, Evaluating The
Effectiveness Of Thesaurus And Stemming Methods In Retrieving
Malay Translated Al-Quran Documents, Proceeding Of 6th
International Conference On Asian Digital Libraries, 2003, pp. 653-
662. Springer-verlag.