Abstract: Search is the most obvious application of information
retrieval. The variety of widely obtainable biomedical data is
enormous and is expanding fast. This expansion makes the existing
techniques are not enough to extract the most interesting patterns
from the collection as per the user requirement. Recent researches are
concentrating more on semantic based searching than the traditional
term based searches. Algorithms for semantic searches are
implemented based on the relations exist between the words of the
documents. Ontologies are used as domain knowledge for identifying
the semantic relations as well as to structure the data for effective
information retrieval. Annotation of data with concepts of ontology is
one of the wide-ranging practices for clustering the documents. In
this paper, indexing based on concept and annotation are proposed
for clustering the biomedical documents. Fuzzy c-means (FCM)
clustering algorithm is used to cluster the documents. The
performances of the proposed methods are analyzed with traditional
term based clustering for PubMed articles in five different diseases
communities. The experimental results show that the proposed
methods outperform the term based fuzzy clustering.
Abstract: In Content-Based Image Retrieval systems it is
important to use an efficient indexing technique in order to perform
and accelerate the search in huge databases. The used indexing
technique should also support the high dimensions of image features.
In this paper we present the hierarchical index NOHIS-tree (Non
Overlapping Hierarchical Index Structure) when we scale up to very
large databases. We also present a study of the influence of clustering
on search time. The performance test results show that NOHIS-tree
performs better than SR-tree. Tests also show that NOHIS-tree keeps
its performances in high dimensional spaces. We include the
performance test that try to determine the number of clusters in
NOHIS-tree to have the best search time.
Abstract: The novelty proposed in this study is twofold and consists in the developing of a new color similarity metric based on the human visual system and a new color indexing based on a textual approach. The new color similarity metric proposed is based on the color perception of the human visual system. Consequently the results returned by the indexing system can fulfill as much as possibile the user expectations. We developed a web application to collect the users judgments about the similarities between colors, whose results are used to estimate the metric proposed in this study. In order to index the image's colors, we used a text indexing engine to facilitate the integration of visual features in a database of text documents. The textual signature is build by weighting the image's colors in according to their occurrence in the image. The use of a textual indexing engine, provide us a simple, fast and robust solution to index images. A typical usage of the system proposed in this study, is the development of applications whose data type is both visual and textual. In order to evaluate the proposed method we chose a price comparison engine as a case of study, collecting a series of commercial offers containing the textual description and the image representing a specific commercial offer.
Abstract: The various types of frequent pattern discovery
problem, namely, the frequent itemset, sequence and graph mining
problems are solved in different ways which are, however, in certain
aspects similar. The main approach of discovering such patterns can
be classified into two main classes, namely, in the class of the levelwise
methods and in that of the database projection-based methods.
The level-wise algorithms use in general clever indexing structures
for discovering the patterns. In this paper a new approach is proposed
for discovering frequent sequences and tree-like patterns efficiently
that is based on the level-wise issue. Because the level-wise
algorithms spend a lot of time for the subpattern testing problem, the
new approach introduces the idea of using automaton theory to solve
this problem.
Abstract: The image segmentation method described in this
paper has been developed as a pre-processing stage to be used in
methodologies and tools for video/image indexing and retrieval by
content. This method solves the problem of whole objects extraction
from background and it produces images of single complete objects
from videos or photos. The extracted images are used for calculating
the object visual features necessary for both indexing and retrieval
processes.
The segmentation algorithm is based on the cooperation among an
optical flow evaluation method, edge detection and region growing
procedures. The optical flow estimator belongs to the class of
differential methods. It permits to detect motions ranging from a
fraction of a pixel to a few pixels per frame, achieving good results in
presence of noise without the need of a filtering pre-processing stage
and includes a specialised model for moving object detection.
The first task of the presented method exploits the cues from
motion analysis for moving areas detection. Objects and background
are then refined using respectively edge detection and seeded region
growing procedures. All the tasks are iteratively performed until
objects and background are completely resolved.
The method has been applied to a variety of indoor and outdoor
scenes where objects of different type and shape are represented on
variously textured background.
Abstract: For a spatiotemporal database management system,
I/O cost of queries and other operations is an important performance
criterion. In order to optimize this cost, an intense research on
designing robust index structures has been done in the past decade.
With these major considerations, there are still other design issues
that deserve addressing due to their direct impact on the I/O cost.
Having said this, an efficient buffer management strategy plays a key
role on reducing redundant disk access. In this paper, we proposed an
efficient buffer strategy for a spatiotemporal database index
structure, specifically indexing objects moving over a network of
roads. The proposed strategy, namely MONPAR, is based on the data
type (i.e. spatiotemporal data) and the structure of the index
structure. For the purpose of an experimental evaluation, we set up a
simulation environment that counts the number of disk accesses
while executing a number of spatiotemporal range-queries over the
index. We reiterated simulations with query sets with different
distributions, such as uniform query distribution and skewed query
distribution. Based on the comparison of our strategy with wellknown
page-replacement techniques, like LRU-based and Prioritybased
buffers, we conclude that MONPAR behaves better than its
competitors for small and medium size buffers under all used query-distributions.
Abstract: A new tool path planning method for 5-axis flank
milling of a globoidal indexing cam is developed in this paper. The
globoidal indexing cam is a practical transmission mechanism due
to its high transmission speed, accuracy and dynamic performance.
Machining the cam profile is a complex and precise task. The profile
surface of the globoidal cam is generated by the conjugate contact
motion of the roller. The generated complex profile surface is usually
machined by 5-axis point-milling method. The point-milling method
is time-consuming compared with flank milling. The tool path for
5-axis flank milling of globoidal cam is developed to improve the
cutting efficiency. The flank milling tool path is globally optimized
according to the minimum zone criterion, and high accuracy is
guaranteed. The computational example and cutting simulation finally
validate the developed method.
Abstract: The most common forensic activity is searching a hard
disk for string of data. Nowadays, investigators and analysts are
increasingly experiencing large, even terabyte sized data sets when
conducting digital investigations. Therefore consecutive searching can
take weeks to complete successfully. There are two primary search
methods: index-based search and bitwise search. Index-based
searching is very fast after the initial indexing but initial indexing
takes a long time. In this paper, we discuss a high speed bitwise search
model for large-scale digital forensic investigations. We used pattern
matching board, which is generally used for network security, to
search for string and complex regular expressions. Our results indicate
that in many cases, the use of pattern matching board can substantially
increase the performance of digital forensic search tools.
Abstract: With the advance of multimedia and diagnostic
images technologies, the number of radiographic images is increasing
constantly. The medical field demands sophisticated systems for
search and retrieval of the produced multimedia document. This
paper presents an ongoing research that focuses on the semantic
content of radiographic image documents to facilitate semantic-based
radiographic image indexing and a retrieval system. The proposed
model would divide a radiographic image document, based on its
semantic content, and would be converted into a logical structure or
a semantic structure. The logical structure represents the overall
organization of information. The semantic structure, which is bound
to logical structure, is composed of semantic objects with
interrelationships in the various spaces in the radiographic image.
Abstract: In this work we will present a new approach for shot transition auto-detection. Our approach is based on the analysis of Spatio-Temporal Video Slice (STVS) edges extracted from videos. The proposed approach is capable to efficiently detect both abrupt shot transitions 'cuts' and gradual ones such as fade-in, fade-out and dissolve. Compared to other techniques, our method is distinguished by its high level of precision and speed. Those performances are obtained due to minimizing the problem of the boundary shot detection to a simple 2D image partitioning problem.
Abstract: Semantic query optimization consists in restricting the
search space in order to reduce the set of objects of interest for a
query. This paper presents an indexing method based on UB-trees
and a static analysis of the constraints associated to the views of the
database and to any constraint expressed on attributes. The result of
the static analysis is a partitioning of the object space into disjoint
blocks. Through Space Filling Curve (SFC) techniques, each
fragment (block) of the partition is assigned a unique identifier,
enabling the efficient indexing of fragments by UB-trees. The search
space corresponding to a range query is restricted to a subset of the
blocks of the partition. This approach has been developed in the
context of a KB-DBMS but it can be applied to any relational
system.
Abstract: Locality Sensitive Hashing (LSH) is one of the most
promising techniques for solving nearest neighbour search problem in
high dimensional space. Euclidean LSH is the most popular variation
of LSH that has been successfully applied in many multimedia
applications. However, the Euclidean LSH presents limitations that
affect structure and query performances. The main limitation of the
Euclidean LSH is the large memory consumption. In order to achieve
a good accuracy, a large number of hash tables is required. In this
paper, we propose a new hashing algorithm to overcome the storage
space problem and improve query time, while keeping a good
accuracy as similar to that achieved by the original Euclidean LSH.
The Experimental results on a real large-scale dataset show that the
proposed approach achieves good performances and consumes less
memory than the Euclidean LSH.
Abstract: Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.
Abstract: Nowadays, ontologies are the only widely accepted paradigm for the management of sharable and reusable knowledge in a way that allows its automatic interpretation. They are collaboratively created across the Web and used to index, search and annotate documents. The vast majority of the ontology based approaches, however, focus on indexing texts at document level. Recently, with the advances in ontological engineering, it became clear that information indexing can largely benefit from the use of general purpose ontologies which aid the indexing of documents at word level. This paper presents a concept indexing algorithm, which adds ontology information to words and phrases and allows full text to be searched, browsed and analyzed at different levels of abstraction. This algorithm uses a general purpose ontology, OntoRo, and an ontologically tagged corpus, OntoCorp, both developed for the purpose of this research. OntoRo and OntoCorp are used in a two-stage supervised machine learning process aimed at generating ontology tagging rules. The first experimental tests show a tagging accuracy of 78.91% which is encouraging in terms of the further improvement of the algorithm.
Abstract: This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.
Abstract: As a popular rank-reduced vector space approach,
Latent Semantic Indexing (LSI) has been used in information
retrieval and other applications. In this paper, an LSI-based content
vector model for text classification is presented, which constructs
multiple augmented category LSI spaces and classifies text by their
content. The model integrates the class discriminative information
from the training data and is equipped with several pertinent feature
selection and text classification algorithms. The proposed classifier
has been applied to email classification and its experiments on a
benchmark spam testing corpus (PU1) have shown that the approach
represents a competitive alternative to other email classifiers based
on the well-known SVM and naïve Bayes algorithms.
Abstract: This paper describes a segmentation algorithm based
on the cooperation of an optical flow estimation method with edge
detection and region growing procedures.
The proposed method has been developed as a pre-processing
stage to be used in methodologies and tools for video/image indexing
and retrieval by content. The addressed problem consists in
extracting whole objects from background for producing images of
single complete objects from videos or photos. The extracted images
are used for calculating the object visual features necessary for both
indexing and retrieval processes.
The first task of the algorithm exploits the cues from motion
analysis for moving area detection. Objects and background are then
refined using respectively edge detection and region growing
procedures. These tasks are iteratively performed until objects and
background are completely resolved.
The developed method has been applied to a variety of indoor and
outdoor scenes where objects of different type and shape are
represented on variously textured background.
Abstract: The latest Geographic Information System (GIS)
technology makes it possible to administer the spatial components of
daily “business object," in the corporate database, and apply suitable
geographic analysis efficiently in a desktop-focused application. We
can use wireless internet technology for transfer process in spatial
data from server to client or vice versa. However, the problem in
wireless Internet is system bottlenecks that can make the process of
transferring data not efficient. The reason is large amount of spatial
data. Optimization in the process of transferring and retrieving data,
however, is an essential issue that must be considered. Appropriate
decision to choose between R-tree and Quadtree spatial data indexing
method can optimize the process. With the rapid proliferation of
these databases in the past decade, extensive research has been
conducted on the design of efficient data structures to enable fast
spatial searching. Commercial database vendors like Oracle have also
started implementing these spatial indexing to cater to the large and
diverse GIS. This paper focuses on the decisions to choose R-tree
and quadtree spatial indexing using Oracle spatial database in mobile
GIS application. From our research condition, the result of using
Quadtree and R-tree spatial data indexing method in one single
spatial database can save the time until 42.5%.
Abstract: Task of object localization is one of the major
challenges in creating intelligent transportation. Unfortunately, in
densely built-up urban areas, localization based on GPS only
produces a large error, or simply becomes impossible. New
opportunities arise for the localization due to the rapidly emerging
concept of a wireless ad-hoc network. Such network, allows
estimating potential distance between these objects measuring
received signal level and construct a graph of distances in which
nodes are the localization objects, and edges - estimates of the
distances between pairs of nodes. Due to the known coordinates of
individual nodes (anchors), it is possible to determine the location of
all (or part) of the remaining nodes of the graph. Moreover, road
map, available in digital format can provide localization routines
with valuable additional information to narrow node location search.
However, despite abundance of well-known algorithms for solving
the problem of localization and significant research efforts, there are
still many issues that currently are addressed only partially. In this
paper, we propose localization approach based on the graph mapped
distances on the digital road map data basis. In fact, problem is
reduced to distance graph embedding into the graph representing area
geo location data. It makes possible to localize objects, in some cases
even if only one reference point is available. We propose simple
embedding algorithm and sample implementation as spatial queries
over sensor network data stored in spatial database, allowing
employing effectively spatial indexing, optimized spatial search
routines and geometry functions.
Abstract: Graph has become increasingly important in modeling
complicated structures and schemaless data such as proteins, chemical
compounds, and XML documents. Given a graph query, it is desirable
to retrieve graphs quickly from a large database via graph-based
indices. Different from the existing methods, our approach, called
VFM (Vertex to Frequent Feature Mapping), makes use of vertices
and decision features as the basic indexing feature. VFM constructs
two mappings between vertices and frequent features to answer graph
queries. The VFM approach not only provides an elegant solution to
the graph indexing problem, but also demonstrates how database
indexing and query processing can benefit from data mining,
especially frequent pattern mining. The results show that the proposed
method not only avoids the enumeration method of getting subgraphs
of query graph, but also effectively reduces the subgraph isomorphism
tests between the query graph and graphs in candidate answer set in
verification stage.