Abstract: We present a method to create special domain
collections from news sites. The method only requires a single
sample article as a seed. No prior corpus statistics are needed and the
method is applicable to multiple languages. We examine various
similarity measures and the creation of document collections for
English and Japanese. The main contributions are as follows. First,
the algorithm can build special domain collections from as little as
one sample document. Second, unlike other algorithms it does not
require a second “general" corpus to compute statistics. Third, in our
testing the algorithm outperformed others in creating collections
made up of highly relevant articles.
Abstract: XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.
Abstract: Text categorization techniques are widely used to many Information Retrieval (IR) applications. In this paper, we proposed a simple but efficient method that can automatically find the relationship between any pair of terms and documents, also an indexing matrix is established for text categorization. We call this method Indexing Matrix Categorization Machine (IMCM). Several experiments are conducted to show the efficiency and robust of our algorithm.
Abstract: In the current age, retrieval of relevant information
from massive amount of data is a challenging job. Over the years,
precise and relevant retrieval of information has attained high
significance. There is a growing need in the market to build systems,
which can retrieve multimedia information that precisely meets the
user's current needs. In this paper, we have introduced a framework
for refining query results before showing it to the user, using ambient
intelligence, user profile, group profile, user location, time, day, user
device type and extracted features. A prototypic tool was also
developed to demonstrate the efficiency of the proposed approach.
Abstract: Needs of an efficient information retrieval in recent
years in increased more then ever because of the frequent use of
digital information in our life. We see a lot of work in the area of
textual information but in multimedia information, we cannot find
much progress. In text based information, new technology of data
mining and data marts are now in working that were started from the
basic concept of database some where in 1960.
In image search and especially in image identification,
computerized system at very initial stages. Even in the area of image
search we cannot see much progress as in the case of text based
search techniques. One main reason for this is the wide spread roots
of image search where many area like artificial intelligence,
statistics, image processing, pattern recognition play their role. Even
human psychology and perception and cultural diversity also have
their share for the design of a good and efficient image recognition
and retrieval system.
A new object based search technique is presented in this paper
where object in the image are identified on the basis of their
geometrical shapes and other features like color and texture where
object-co-relation augments this search process.
To be more focused on objects identification, simple images are
selected for the work to reduce the role of segmentation in overall
process however same technique can also be applied for other
images.
Abstract: Information Retrieval has the objective of studying
models and the realization of systems allowing a user to find the
relevant documents adapted to his need of information. The
information search is a problem which remains difficult because the
difficulty in the representing and to treat the natural languages such
as polysemia. Intentional Structures promise to be a new paradigm to
extend the existing documents structures and to enhance the different
phases of documents process such as creation, editing, search and
retrieval. The intention recognition of the author-s of texts can reduce
the largeness of this problem. In this article, we present intentions
recognition system is based on a semi-automatic method of
extraction the intentional information starting from a corpus of text.
This system is also able to update the ontology of intentions for the
enrichment of the knowledge base containing all possible intentions
of a domain. This approach uses the construction of a semi-formal
ontology which considered as the conceptualization of the intentional
information contained in a text. An experiments on scientific
publications in the field of computer science was considered to
validate this approach.
Abstract: Currently searching through internet is very popular especially in a field of academic. A huge of educational information such as research papers are overload for user. So community-base web sites have been developed to help user search information more easily from process of customizing a web site to need each specifies user or set of user. In this paper propose to use association rule analyze the community group on research paper bookmarking. A set of design goals for community group frameworks is developed and discussed. Additionally Researcher analyzes the initial relation by using association rule discovery between the antecedent and the consequent of a rule in the groups of user for generate the idea to improve ranking search result and development recommender system.
Abstract: Following the loss of NASA's Space Shuttle
Columbia in 2003, it was determined that problems in the agency's
organization created an environment that led to the accident. One
component of the proposed solution resulted in the formation of the
NASA Engineering Network (NEN), a suite of information retrieval
and knowledge-sharing tools. This paper describes the
implementation of communities of practice, which are formed along
engineering disciplines. Communities of practice enable engineers to
leverage their knowledge and best practices to collaborate and take
information learning back to their jobs and embed it into the
procedures of the agency. This case study offers insight into using
traditional engineering disciplines for virtual collaboration, including
lessons learned during the creation and establishment of NASA-s
communities.
Abstract: In this study a clustering technique has been implemented which is K-Means like with hierarchical initial set (HKM). The goal of this study is to prove that clustering document sets do enhancement precision on information retrieval systems, since it was proved by Bellot & El-Beze on French language. A comparison is made between the traditional information retrieval system and the clustered one. Also the effect of increasing number of clusters on precision is studied. The indexing technique is Term Frequency * Inverse Document Frequency (TF * IDF). It has been found that the effect of Hierarchical K-Means Like clustering (HKM) with 3 clusters over 242 Arabic abstract documents from the Saudi Arabian National Computer Conference has significant results compared with traditional information retrieval system without clustering. Additionally it has been found that it is not necessary to increase the number of clusters to improve precision more.