Abstract: Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.
Abstract: When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (
Abstract: Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.
Abstract: Seeking and sharing knowledge on online forums
have made them popular in recent years. Although online forums are
valuable sources of information, due to variety of sources of
messages, retrieving reliable threads with high quality content is an
issue. Majority of the existing information retrieval systems ignore
the quality of retrieved documents, particularly, in the field of thread
retrieval. In this research, we present an approach that employs
various quality features in order to investigate the quality of retrieved
threads. Different aspects of content quality, including completeness,
comprehensiveness, and politeness, are assessed using these features,
which lead to finding not only textual, but also conceptual relevant
threads for a user query within a forum. To analyse the influence of
the features, we used an adopted version of voting model thread
search as a retrieval system. We equipped it with each feature solely
and also various combinations of features in turn during multiple
runs. The results show that incorporating the quality features
enhances the effectiveness of the utilised retrieval system
significantly.
Abstract: This paper presents an application of a “Systematic
Soft Domain Driven Design Framework” as a soft systems approach
to domain-driven design of information systems development. The
framework use SSM as a guiding methodology within which we have
embedded a sequence of design tasks based on the UML leading to
the implementation of a software system using the Naked Objects
framework. This framework have been used in action research
projects that have involved the investigation and modelling of
business processes using object-oriented domain models and the
implementation of software systems based on those domain models.
Within this framework, Soft Systems Methodology (SSM) is used as
a guiding methodology to explore the problem situation and to
develop the domain model using UML for the given business
domain. The framework is proposed and evaluated in our previous
works, and a real case study “Information Retrieval System for
academic research” is used, in this paper, to show further practice and
evaluation of the framework in different business domain. We argue
that there are advantages from combining and using techniques from
different methodologies in this way for business domain modelling.
The framework is overviewed and justified as multimethodology
using Mingers multimethodology ideas.
Abstract: Information retrieval has become an important field of study and research under computer science due to explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive study, which discusses not only emergence and evolution of information retrieval but also includes different information retrieval models and some important aspects such as document representation, similarity measure and query expansion.
Abstract: Reformulating the user query is a technique that aims to improve the performance of an Information Retrieval System (IRS) in terms of precision and recall. This paper tries to evaluate the technique of query reformulation guided by an external resource for Arabic texts. To do this, various precision and recall measures were conducted and two corpora with different external resources like Arabic WordNet (AWN) and the Arabic Dictionary (thesaurus) of Meaning (ADM) were used. Examination of the obtained results will allow us to measure the real contribution of this reformulation technique in improving the IRS performance.
Abstract: XML is a markup language which is becoming the
standard format for information representation and data exchange. A
major purpose of XML is the explicit representation of the logical
structure of a document. Much research has been performed to
exploit logical structure of documents in information retrieval in
order to precisely extract user information need from large
collections of XML documents. In this paper, we describe an XML
information retrieval weighting scheme that tries to find the most
relevant elements in XML documents in response to a user query.
We present this weighting model for information retrieval systems
that utilize plausible inferences to infer the relevance of elements in
XML documents. We also add to this model the Dempster-Shafer
theory of evidence to express the uncertainty in plausible inferences
and Dempster-Shafer rule of combination to combine evidences
derived from different inferences.
Abstract: Due to new distributed database applications such as
huge deductive database systems, the search complexity is constantly
increasing and we need better algorithms to speedup traditional
relational database queries. An optimal dynamic programming
method for such high dimensional queries has the big disadvantage of
its exponential order and thus we are interested in semi-optimal but
faster approaches. In this work we present a multi-agent based
mechanism to meet this demand and also compare the result with
some commonly used query optimization algorithms.
Abstract: This paper proposes rough set models with three
different level knowledge granules in incomplete information system
under tolerance relation by similarity between objects according to
their attribute values. Through introducing dominance relation on the
discourse to decompose similarity classes into three subclasses: little
better subclass, little worse subclass and vague subclass, it dismantles
lower and upper approximations into three components. By using
these components, retrieving information to find naturally hierarchical
expansions to queries and constructing answers to elaborative queries
can be effective. It illustrates the approach in applying rough set
models in the design of information retrieval system to access different
granular expanded documents. The proposed method enhances rough
set model application in the flexibility of expansions and elaborative
queries in information retrieval.
Abstract: Today, Genetic Algorithm has been used to solve
wide range of optimization problems. Some researches conduct on
applying Genetic Algorithm to text classification, summarization
and information retrieval system in text mining process. This
researches show a better performance due to the nature of Genetic
Algorithm. In this paper a new algorithm for using Genetic
Algorithm in concept weighting and topic identification, based on
concept standard deviation will be explored.
Abstract: In this paper, a model for an information retrieval
system is proposed which takes into account that knowledge about
documents and information need of users are dynamic. Two
methods are combined, one qualitative or symbolic and the other
quantitative or numeric, which are deemed suitable for many
clustering contexts, data analysis, concept exploring and
knowledge discovery. These two methods may be classified as
inductive learning techniques. In this model, they are introduced to
build “long term" knowledge about past queries and concepts in a
collection of documents. The “long term" knowledge can guide
and assist the user to formulate an initial query and can be
exploited in the process of retrieving relevant information. The
different kinds of knowledge are organized in different points of
view. This may be considered an enrichment of the exploration
level which is coherent with the concept of document/query
structure.
Abstract: This study investigates the use of genetic algorithms
in information retrieval. The method is shown to be applicable to
three well-known documents collections, where more relevant
documents are presented to users in the genetic modification. In this
paper we present a new fitness function for approximate information
retrieval which is very fast and very flexible, than cosine similarity
fitness function.
Abstract: In this paper, we propose a new model of English-
Vietnamese bilingual Information Retrieval system. Although there
are so many CLIR systems had been researched and built, the accuracy of searching results in different languages that the CLIR
system supports still need to improve, especially in finding bilingual documents. The problems identified in this paper are the limitation of
machine translation-s result and the extra large collections of document to be found. So we try to establish a different model to overcome these problems.
Abstract: In this paper we propose a multi-agent architecture for web information retrieval using fuzzy logic based result fusion mechanism. The model is designed in JADE framework and takes advantage of JXTA agent communication method to allow agent communication through firewalls and network address translators. This approach enables developers to build and deploy P2P applications through a unified medium to manage agent-based document retrieval from multiple sources.
Abstract: In this study a clustering technique has been implemented which is K-Means like with hierarchical initial set (HKM). The goal of this study is to prove that clustering document sets do enhancement precision on information retrieval systems, since it was proved by Bellot & El-Beze on French language. A comparison is made between the traditional information retrieval system and the clustered one. Also the effect of increasing number of clusters on precision is studied. The indexing technique is Term Frequency * Inverse Document Frequency (TF * IDF). It has been found that the effect of Hierarchical K-Means Like clustering (HKM) with 3 clusters over 242 Arabic abstract documents from the Saudi Arabian National Computer Conference has significant results compared with traditional information retrieval system without clustering. Additionally it has been found that it is not necessary to increase the number of clusters to improve precision more.