Abstract: Information retrieval has become an important field of study and research under computer science due to explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive study, which discusses not only emergence and evolution of information retrieval but also includes different information retrieval models and some important aspects such as document representation, similarity measure and query expansion.
Abstract: Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.
Abstract: As a popular rank-reduced vector space approach,
Latent Semantic Indexing (LSI) has been used in information
retrieval and other applications. In this paper, an LSI-based content
vector model for text classification is presented, which constructs
multiple augmented category LSI spaces and classifies text by their
content. The model integrates the class discriminative information
from the training data and is equipped with several pertinent feature
selection and text classification algorithms. The proposed classifier
has been applied to email classification and its experiments on a
benchmark spam testing corpus (PU1) have shown that the approach
represents a competitive alternative to other email classifiers based
on the well-known SVM and naïve Bayes algorithms.
Abstract: Documents retrieval in Information Retrieval
Systems (IRS) is generally about understanding of
information in the documents concern. The more the system
able to understand the contents of documents the more
effective will be the retrieval outcomes. But understanding of the
contents is a very complex task. Conventional IRS apply algorithms
that can only approximate the meaning of document contents through
keywords approach using vector space model. Keywords may be
unstemmed or stemmed. When keywords are stemmed and conflated
in retrieving process, we are a step forwards in applying semantic
technology in IRS. Word stemming is a process in morphological
analysis under natural language processing, before syntactic and
semantic analysis. We have developed algorithms for Malay and
Arabic and incorporated stemming in our experimental systems in
order to measure retrieval effectiveness. The results have shown that
the retrieval effectiveness has increased when stemming is used in
the systems.
Abstract: With the development of virtual communities, there is
an increase in the number of members in Virtual Communities (VCs).
Many join VCs with the objective of sharing their knowledge and
seeking knowledge from others. Despite the eagerness of sharing
knowledge and receiving knowledge through VCs, there is no
standard of assessing ones knowledge sharing capabilities and
prospects of knowledge sharing. This paper developed a vector space
model to assess the knowledge sharing prospect of VC users.