Abstract: Web search engines are designed to retrieve and
extract the information in the web databases and to return dynamic
web pages. The Semantic Web is an extension of the current web in
which it includes semantic content in web pages. The main goal of
semantic web is to promote the quality of the current web by
changing its contents into machine understandable form. Therefore,
the milestone of semantic web is to have semantic level information
in the web. Nowadays, people use different keyword- based search
engines to find the relevant information they need from the web.
But many of the words are polysemous. When these words are
used to query a search engine, it displays the Search Result Records
(SRRs) with different meanings. The SRRs with similar meanings are
grouped together based on Word Sense Disambiguation (WSD). In
addition to that semantic annotation is also performed to improve the
efficiency of search result records. Semantic Annotation is the
process of adding the semantic metadata to web resources. Thus the
grouped SRRs are annotated and generate a summary which
describes the information in SRRs. But the automatic semantic
annotation is a significant challenge in the semantic web. Here
ontology and knowledge based representation are used to annotate
the web pages.
Abstract: This paper summarizes the results of some experiments for finding the effective features for disambiguation of Turkish verbs. Word sense disambiguation is a current area of investigation in which verbs have the dominant role. Generally verbs have more senses than the other types of words in the average and detecting these features for verbs may lead to some improvements for other word types. In this paper we have considered only the syntactical features that can be obtained from the corpus and tested by using some famous machine learning algorithms.
Abstract: Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.