Abstract: The evaluation of the question answering system is a major research area that needs much attention. Before the rise of domain-oriented question answering systems based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when question answering systems began to be more domains specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time achieve higher quality responses The research in this paper discusses the inappropriateness of the existing measure for response quality evaluation and in a later part, the call for new standard measures and the related considerations are brought forward. As a short-term solution for evaluating response quality of heterogeneous systems, and to demonstrate the challenges in evaluating systems of different nature, this research presents a black-box approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems (i.e. AnswerBus, START and NaLURI).
Abstract: Music segmentation is a key issue in music information
retrieval (MIR) as it provides an insight into the
internal structure of a composition. Structural information about
a composition can improve several tasks related to MIR such
as searching and browsing large music collections, visualizing
musical structure, lyric alignment, and music summarization.
The authors of this paper present the MTSSM framework, a twolayer
framework for the multi-track segmentation of symbolic
music. The strength of this framework lies in the combination of
existing methods for local track segmentation and the application
of global structure information spanning via multiple tracks.
The first layer of the MTSSM uses various string matching
techniques to detect the best candidate segmentations for each
track of a multi-track composition independently. The second
layer combines all single track results and determines the best
segmentation for each track in respect to the global structure of
the composition.
Abstract: Efficient retrieval of multimedia objects has gained enormous focus in recent years. A number of techniques have been suggested for retrieval of textual information; however, relatively little has been suggested for efficient retrieval of multimedia objects. In this paper we have proposed a generic architecture for contextaware retrieval of multimedia objects. The proposed framework combines the well-known approaches of text-based retrieval and context-aware retrieval to formulate architecture for accurate retrieval of multimedia data.
Abstract: Due to new distributed database applications such as
huge deductive database systems, the search complexity is constantly
increasing and we need better algorithms to speedup traditional
relational database queries. An optimal dynamic programming
method for such high dimensional queries has the big disadvantage of
its exponential order and thus we are interested in semi-optimal but
faster approaches. In this work we present a multi-agent based
mechanism to meet this demand and also compare the result with
some commonly used query optimization algorithms.
Abstract: This paper proposes rough set models with three
different level knowledge granules in incomplete information system
under tolerance relation by similarity between objects according to
their attribute values. Through introducing dominance relation on the
discourse to decompose similarity classes into three subclasses: little
better subclass, little worse subclass and vague subclass, it dismantles
lower and upper approximations into three components. By using
these components, retrieving information to find naturally hierarchical
expansions to queries and constructing answers to elaborative queries
can be effective. It illustrates the approach in applying rough set
models in the design of information retrieval system to access different
granular expanded documents. The proposed method enhances rough
set model application in the flexibility of expansions and elaborative
queries in information retrieval.
Abstract: Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.
Abstract: The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more domain specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time to achieve high quality responses. This paper discusses the inappropriateness of the existing measures for response quality evaluation and the call for new standard measures and related considerations are brought forward. As a short-term solution for evaluating response quality of conversational agents, and to demonstrate the challenges in evaluating systems of different nature, this research proposes a blackbox approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems, AnswerBus, START and AINI.
Abstract: Number of documents being created increases at an
increasing pace while most of them being in already known topics
and little of them introducing new concepts. This fact has started a
new era in information retrieval discipline where the requirements
have their own specialties. That is digging into topics and concepts
and finding out subtopics or relations between topics. Up to now IR
researches were interested in retrieving documents about a general
topic or clustering documents under generic subjects. However these
conventional approaches can-t go deep into content of documents
which makes it difficult for people to reach to right documents they
were searching. So we need new ways of mining document sets
where the critic point is to know much about the contents of the
documents. As a solution we are proposing to enhance LSI, one of
the proven IR techniques by supporting its vector space with n-gram
forms of words. Positive results we have obtained are shown in two
different application area of IR domain; querying a document
database, clustering documents in the document database.
Abstract: The development of distributed systems has been affected by the need to accommodate an increasing degree of flexibility, adaptability, and autonomy. The Mobile Agent technology is emerging as an alternative to build a smart generation of highly distributed systems. In this work, we investigate the performance aspect of agent-based technologies for information retrieval. We present a comparative performance evaluation model of Mobile Agents versus Remote Method Invocation by means of an analytical approach. We demonstrate the effectiveness of mobile agents for dynamic code deployment and remote data processing by reducing total latency and at the same time producing minimum network traffic. We argue that exploiting agent-based technologies significantly enhances the performance of distributed systems in the domain of information retrieval.
Abstract: We present here the results for a comparative study of
some techniques, available in the literature, related to the relevance
feedback mechanism in the case of a short-term learning. Only one
method among those considered here is belonging to the data mining
field which is the K-nearest neighbors algorithm (KNN) while the
rest of the methods is related purely to the information retrieval field
and they fall under the purview of the following three major axes:
Shifting query, Feature Weighting and the optimization of the
parameters of similarity metric. As a contribution, and in addition to
the comparative purpose, we propose a new version of the KNN
algorithm referred to as an incremental KNN which is distinct from
the original version in the sense that besides the influence of the
seeds, the rate of the actual target image is influenced also by the
images already rated. The results presented here have been obtained
after experiments conducted on the Wang database for one iteration
and utilizing color moments on the RGB space. This compact
descriptor, Color Moments, is adequate for the efficiency purposes
needed in the case of interactive systems. The results obtained allow
us to claim that the proposed algorithm proves good results; it even
outperforms a wide range of techniques available in the literature.
Abstract: Methods for organizing web data into groups in order
to analyze web-based hypertext data and facilitate data availability
are very important in terms of the number of documents available
online. Thereby, the task of clustering web-based document structures
has many applications, e.g., improving information retrieval on the
web, better understanding of user navigation behavior, improving web
users requests servicing, and increasing web information accessibility.
In this paper we investigate a new approach for clustering web-based
hypertexts on the basis of their graph structures. The hypertexts will
be represented as so called generalized trees which are more general
than usual directed rooted trees, e.g., DOM-Trees. As a important
preprocessing step we measure the structural similarity between the
generalized trees on the basis of a similarity measure d. Then,
we apply agglomerative clustering to the obtained similarity matrix
in order to create clusters of hypertext graph patterns representing
navigation structures. In the present paper we will run our approach
on a data set of hypertext structures and obtain good results in
Web Structure Mining. Furthermore we outline the application of
our approach in Web Usage Mining as future work.
Abstract: In the field of concepts, the measure of Wu and Palmer [1] has the advantage of being simple to implement and have good performances compared to the other similarity measures [2]. Nevertheless, the Wu and Palmer measure present the following disadvantage: in some situations, the similarity of two elements of an IS-A ontology contained in the neighborhood exceeds the similarity value of two elements contained in the same hierarchy. This situation is inadequate within the information retrieval framework. To overcome this problem, we propose a new similarity measure based on the Wu and Palmer measure. Our objective is to obtain realistic results for concepts not located in the same way. The obtained results show that compared to the Wu and Palmer approach, our measure presents a profit in terms of relevance and execution time.
Abstract: Documents retrieval in Information Retrieval
Systems (IRS) is generally about understanding of
information in the documents concern. The more the system
able to understand the contents of documents the more
effective will be the retrieval outcomes. But understanding of the
contents is a very complex task. Conventional IRS apply algorithms
that can only approximate the meaning of document contents through
keywords approach using vector space model. Keywords may be
unstemmed or stemmed. When keywords are stemmed and conflated
in retrieving process, we are a step forwards in applying semantic
technology in IRS. Word stemming is a process in morphological
analysis under natural language processing, before syntactic and
semantic analysis. We have developed algorithms for Malay and
Arabic and incorporated stemming in our experimental systems in
order to measure retrieval effectiveness. The results have shown that
the retrieval effectiveness has increased when stemming is used in
the systems.
Abstract: Today, Genetic Algorithm has been used to solve
wide range of optimization problems. Some researches conduct on
applying Genetic Algorithm to text classification, summarization
and information retrieval system in text mining process. This
researches show a better performance due to the nature of Genetic
Algorithm. In this paper a new algorithm for using Genetic
Algorithm in concept weighting and topic identification, based on
concept standard deviation will be explored.
Abstract: As the web continues to grow exponentially, the idea
of crawling the entire web on a regular basis becomes less and less
feasible, so the need to include information on specific domain,
domain-specific search engines was proposed. As more information
becomes available on the World Wide Web, it becomes more difficult
to provide effective search tools for information access. Today,
people access web information through two main kinds of search
interfaces: Browsers (clicking and following hyperlinks) and Query
Engines (queries in the form of a set of keywords showing the topic
of interest) [2]. Better support is needed for expressing one's
information need and returning high quality search results by web
search tools. There appears to be a need for systems that do reasoning
under uncertainty and are flexible enough to recover from the
contradictions, inconsistencies, and irregularities that such reasoning
involves. In a multi-view problem, the features of the domain can be
partitioned into disjoint subsets (views) that are sufficient to learn the
target concept. Semi-supervised, multi-view algorithms, which
reduce the amount of labeled data required for learning, rely on the
assumptions that the views are compatible and uncorrelated. This
paper describes the use of semi-structured machine learning approach
with Active learning for the “Domain Specific Search Engines". A
domain-specific search engine is “An information access system that
allows access to all the information on the web that is relevant to a
particular domain. The proposed work shows that with the help of
this approach relevant data can be extracted with the minimum
queries fired by the user. It requires small number of labeled data and
pool of unlabelled data on which the learning algorithm is applied to
extract the required data.
Abstract: One of the ubiquitous routines in medical practice is searching through voluminous piles of clinical documents. In this paper we introduce a distributed system to search and exchange clinical documents. Clinical documents are distributed peer-to-peer. Relevant information is found in multiple iterations of cross-searches between the clinical text and its domain encyclopedia.
Abstract: Nowadays social media are important tools for web
resource discovery. The performance and capabilities of web searches
are vital, especially search results from social research paper
bookmarking. This paper proposes a new algorithm for ranking
method that is a combination of similarity ranking with paper posted
time or CSTRank. The paper posted time is static ranking for
improving search results. For this particular study, the paper posted
time is combined with similarity ranking to produce a better ranking
than other methods such as similarity ranking or SimRank. The
retrieval performance of combination rankings is evaluated using
mean values of NDCG. The evaluation in the experiments implies
that the chosen CSTRank ranking by using weight score at ratio 90:10
can improve the efficiency of research paper searching on social
bookmarking websites.
Abstract: In this paper, a model for an information retrieval
system is proposed which takes into account that knowledge about
documents and information need of users are dynamic. Two
methods are combined, one qualitative or symbolic and the other
quantitative or numeric, which are deemed suitable for many
clustering contexts, data analysis, concept exploring and
knowledge discovery. These two methods may be classified as
inductive learning techniques. In this model, they are introduced to
build “long term" knowledge about past queries and concepts in a
collection of documents. The “long term" knowledge can guide
and assist the user to formulate an initial query and can be
exploited in the process of retrieving relevant information. The
different kinds of knowledge are organized in different points of
view. This may be considered an enrichment of the exploration
level which is coherent with the concept of document/query
structure.
Abstract: This study investigates the use of genetic algorithms
in information retrieval. The method is shown to be applicable to
three well-known documents collections, where more relevant
documents are presented to users in the genetic modification. In this
paper we present a new fitness function for approximate information
retrieval which is very fast and very flexible, than cosine similarity
fitness function.
Abstract: This paper presents a digital engineering library – the
Digital Mechanism and Gear Library, DMG-Lib – providing a multimedia collection of e-books, pictures, videos and animations in the domain of mechanisms and machines. The specific characteristic
about DMG-Lib is the enrichment and cross-linking of the different
sources. DMG-Lib e-books not only present pages as pixel images
but also selected figures augmented with interactive animations. The
presentation of animations in e-books increases the clearness of the
information.
To present the multimedia e-books and make them available in the
DMG-Lib internet portal a special e-book reader called StreamBook
was developed for optimal presentation of digitized books and to
enable reading the e-books as well as working efficiently and individually with the enriched information. The objective is to support different user tasks ranging from information retrieval to
development and design of mechanisms.