Word Stemming Algorithms and Retrieval Effectiveness in Malay and Arabic Documents Retrieval Systems

Documents retrieval in Information Retrieval Systems (IRS) is generally about understanding of information in the documents concern. The more the system able to understand the contents of documents the more effective will be the retrieval outcomes. But understanding of the contents is a very complex task. Conventional IRS apply algorithms that can only approximate the meaning of document contents through keywords approach using vector space model. Keywords may be unstemmed or stemmed. When keywords are stemmed and conflated in retrieving process, we are a step forwards in applying semantic technology in IRS. Word stemming is a process in morphological analysis under natural language processing, before syntactic and semantic analysis. We have developed algorithms for Malay and Arabic and incorporated stemming in our experimental systems in order to measure retrieval effectiveness. The results have shown that the retrieval effectiveness has increased when stemming is used in the systems.