Abstract: Over the past decade, there has been a steep rise in
the data-driven analysis in major areas of medicine, such as clinical
decision support system, survival analysis, patient similarity analysis,
image analytics etc. Most of the data in the field are well-structured
and available in numerical or categorical formats which can be used
for experiments directly. But on the opposite end of the spectrum,
there exists a wide expanse of data that is intractable for direct
analysis owing to its unstructured nature which can be found in the
form of discharge summaries, clinical notes, procedural notes which
are in human written narrative format and neither have any relational
model nor any standard grammatical structure. An important step
in the utilization of these texts for such studies is to transform
and process the data to retrieve structured information from the
haystack of irrelevant data using information retrieval and data mining
techniques. To address this problem, the authors present Q-Map in
this paper, which is a simple yet robust system that can sift through
massive datasets with unregulated formats to retrieve structured
information aggressively and efficiently. It is backed by an effective
mining technique which is based on a string matching algorithm
that is indexed on curated knowledge sources, that is both fast
and configurable. The authors also briefly examine its comparative
performance with MetaMap, one of the most reputed tools for medical
concepts retrieval and present the advantages the former displays over
the latter.
Abstract: With the emergence and development of Information
and Communications Technologies (ICTs), Higher Education is
experiencing rapid changes, not only in its teaching strategies but
also in student’s learning skills. However, we have noticed that
students often have difficulty when seeking innovative, useful, and
interesting learning resources for their work. This is due to the
lack of supervision in the selection of good query tools. This paper
presents AINA, an Information Retrieval (IR) computer system aimed
at providing motivating and stimulating content to both students
and teachers working on different areas and at different educational
levels. In particular, our proposal consists of an open virtual resource
environment oriented to the vast universe of Disney comics and
cartoons. Our test suite includes Disney’s long and shorts films,
and we have performed some activities based on the Just In Time
Teaching (JiTT) methodology. More specifically, it has been tested
by groups of university and secondary school students.
Abstract: When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (
Abstract: The growth in the volume of text data such as books
and articles in libraries for centuries has imposed to establish
effective mechanisms to locate them. Early techniques such as
abstraction, indexing and the use of classification categories have
marked the birth of a new field of research called "Information
Retrieval". Information Retrieval (IR) can be defined as the task of
defining models and systems whose purpose is to facilitate access to
a set of documents in electronic form (corpus) to allow a user to find
the relevant ones for him, that is to say, the contents which matches
with the information needs of the user. This paper presents a new
semantic indexing approach of a documentary corpus. The indexing
process starts first by a term weighting phase to determine the
importance of these terms in the documents. Then the use of a
thesaurus like Wordnet allows moving to the conceptual level.
Each candidate concept is evaluated by determining its level of
representation of the document, that is to say, the importance of the
concept in relation to other concepts of the document. Finally, the
semantic index is constructed by attaching to each concept of the
ontology, the documents of the corpus in which these concepts are
found.
Abstract: The growth in the volume of text data such as books
and articles in libraries for centuries has imposed to establish
effective mechanisms to locate them. Early techniques such as
abstraction, indexing and the use of classification categories have
marked the birth of a new field of research called "Information
Retrieval". Information Retrieval (IR) can be defined as the task of
defining models and systems whose purpose is to facilitate access to
a set of documents in electronic form (corpus) to allow a user to find
the relevant ones for him, that is to say, the contents which matches
with the information needs of the user.
Most of the models of information retrieval use a specific data
structure to index a corpus which is called "inverted file" or "reverse
index".
This inverted file collects information on all terms over the corpus
documents specifying the identifiers of documents that contain the
term in question, the frequency of each term in the documents of the
corpus, the positions of the occurrences of the word...
In this paper we use an oriented object database (db4o) instead of
the inverted file, that is to say, instead to search a term in the inverted
file, we will search it in the db4o database.
The purpose of this work is to make a comparative study to see if
the oriented object databases may be competing for the inverse index
in terms of access speed and resource consumption using a large
volume of data.
Abstract: Genetic Algorithm (GA) is a powerful technique for solving optimization problems. It follows the idea of survival of the fittest - Better and better solutions evolve from previous generations until a near optimal solution is obtained. GA uses the main three operations, the selection, crossover and mutation to produce new generations from the old ones. GA has been widely used to solve optimization problems in many applications such as traveling salesman problem, airport traffic control, information retrieval (IR), reactive power optimization, job shop scheduling, and hydraulics systems such as water pipeline systems. In water pipeline systems we need to achieve some goals optimally such as minimum cost of construction, minimum length of pipes and diameters, and the place of protection devices. GA shows high performance over the other optimization techniques, moreover, it is easy to implement and use. Also, it searches a limited number of solutions.
Abstract: Knowledge discovery from text and ontology learning
are relatively new fields. However their usage is extended in many
fields like Information Retrieval (IR) and its related domains. Human
Plausible Reasoning based (HPR) IR systems for example need a
knowledge base as their underlying system which is currently made
by hand. In this paper we propose an architecture based on ontology
learning methods to automatically generate the needed HPR
knowledge base.
Abstract: Text categorization techniques are widely used to many Information Retrieval (IR) applications. In this paper, we proposed a simple but efficient method that can automatically find the relationship between any pair of terms and documents, also an indexing matrix is established for text categorization. We call this method Indexing Matrix Categorization Machine (IMCM). Several experiments are conducted to show the efficiency and robust of our algorithm.