Abstract: Image registration is the process of establishing point
by point correspondence between images obtained from a same
scene. This process is very useful in remote sensing, medicine,
cartography, computer vision, etc. Then, the task of registration is to
place the data into a common reference frame by estimating the
transformations between the data sets. In this work, we develop a
rigid point registration method based on the application of genetic
algorithms and Hausdorff distance. First, we extract the feature points
from both images based on the algorithm of global and local
curvature corner. After refining the feature points, we use Hausdorff
distance as similarity measure between the two data sets and for
optimizing the search space we use genetic algorithms to achieve
high computation speed for its inertial parallel. The results show the
efficiency of this method for registration of satellite images.
Abstract: Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.
Abstract: The paper presents an on-line recognition machine
(RM) for continuous/isolated, dynamic and static gestures that arise
in Flight Deck Officer (FDO) training. RM is based on generic pattern
recognition framework. Gestures are represented as templates using
summary statistics. The proposed recognition algorithm exploits temporal
and spatial characteristics of gestures via dynamic programming
and Markovian process. The algorithm predicts corresponding index
of incremental input data in the templates in an on-line mode.
Accumulated consistency in the sequence of prediction provides a
similarity measurement (Score) between input data and the templates.
The algorithm provides an intuitive mechanism for automatic detection
of start/end frames of continuous gestures. In the present paper,
we consider isolated gestures. The performance of RM is evaluated
using four datasets - artificial (W TTest), hand motion (Yang) and
FDO (tracker, vision-based ). RM achieves comparable results which
are in agreement with other on-line and off-line algorithms such as
hidden Markov model (HMM) and dynamic time warping (DTW).
The proposed algorithm has the additional advantage of providing
timely feedback for training purposes.
Abstract: FAQ system can make user find answer to the problem that puzzles them. But now the research on Chinese FAQ system is still on the theoretical stage. This paper presents an approach to semantic inference for FAQ mining. To enhance the efficiency, a small pool of the candidate question-answering pairs retrieved from the system for the follow-up work according to the concept of the agriculture domain extracted from user input .Input queries or questions are converted into four parts, the question word segment (QWS), the verb segment (VS), the concept of agricultural areas segment (CS), the auxiliary segment (AS). A semantic matching method is presented to estimate the similarity between the semantic segments of the query and the questions in the pool of the candidate. A thesaurus constructed from the HowNet, a Chinese knowledge base, is adopted for word similarity measure in the matcher. The questions are classified into eleven intension categories using predefined question stemming keywords. For FAQ mining, given a query, the question part and answer part in an FAQ question-answer pair is matched with the input query, respectively. Finally, the probabilities estimated from these two parts are integrated and used to choose the most likely answer for the input query. These approaches are experimented on an agriculture FAQ system. Experimental results indicate that the proposed approach outperformed the FAQ-Finder system in agriculture FAQ retrieval.
Abstract: The most important property of the Gene Ontology is
the terms. These control vocabularies are defined to provide
consistent descriptions of gene products that are shareable and
computationally accessible by humans, software agent, or other
machine-readable meta-data. Each term is associated with
information such as definition, synonyms, database references, amino
acid sequences, and relationships to other terms. This information has
made the Gene Ontology broadly applied in microarray and
proteomic analysis. However, the process of searching the terms is
still carried out using traditional approach which is based on keyword
matching. The weaknesses of this approach are: ignoring semantic
relationships between terms, and highly depending on a specialist to
find similar terms. Therefore, this study combines semantic similarity
measure and genetic algorithm to perform a better retrieval process
for searching semantically similar terms. The semantic similarity
measure is used to compute similitude strength between two terms.
Then, the genetic algorithm is employed to perform batch retrievals
and to handle the situation of the large search space of the Gene
Ontology graph. The computational results are presented to show the
effectiveness of the proposed algorithm.
Abstract: Due to the tremendous amount of information provided
by the World Wide Web (WWW) developing methods for mining
the structure of web-based documents is of considerable interest. In
this paper we present a similarity measure for graphs representing
web-based hypertext structures. Our similarity measure is mainly
based on a novel representation of a graph as linear integer strings,
whose components represent structural properties of the graph. The
similarity of two graphs is then defined as the optimal alignment of
the underlying property strings. In this paper we apply the well known
technique of sequence alignments for solving a novel and challenging
problem: Measuring the structural similarity of generalized trees.
In other words: We first transform our graphs considered as high
dimensional objects in linear structures. Then we derive similarity
values from the alignments of the property strings in order to
measure the structural similarity of generalized trees. Hence, we
transform a graph similarity problem to a string similarity problem for
developing a efficient graph similarity measure. We demonstrate that
our similarity measure captures important structural information by
applying it to two different test sets consisting of graphs representing
web-based document structures.
Abstract: A dissimilarity measure between the empiric
characteristic functions of the subsamples associated to the different
classes in a multivariate data set is proposed. This measure can be
efficiently computed, and it depends on all the cases of each class. It
may be used to find groups of similar classes, which could be joined
for further analysis, or it could be employed to perform an
agglomerative hierarchical cluster analysis of the set of classes. The
final tree can serve to build a family of binary classification models,
offering an alternative approach to the multi-class SVM problem. We
have tested this dendrogram based SVM approach with the oneagainst-
one SVM approach over four publicly available data sets,
three of them being microarray data. Both performances have been
found equivalent, but the first solution requires a smaller number of
binary SVM models.
Abstract: Text similarity measurement is a fundamental issue in
many textual applications such as document clustering, classification,
summarization and question answering. However, prevailing approaches
based on Vector Space Model (VSM) more or less suffer
from the limitation of Bag of Words (BOW), which ignores the semantic
relationship among words. Enriching document representation
with background knowledge from Wikipedia is proven to be an effective
way to solve this problem, but most existing methods still
cannot avoid similar flaws of BOW in a new vector space. In this
paper, we propose a novel text similarity measurement which goes
beyond VSM and can find semantic affinity between documents.
Specifically, it is a unified graph model that exploits Wikipedia as
background knowledge and synthesizes both document representation
and similarity computation. The experimental results on two different
datasets show that our approach significantly improves VSM-based
methods in both text clustering and classification.
Abstract: Clustering categorical data is more complicated than
the numerical clustering because of its special properties. Scalability
and memory constraint is the challenging problem in clustering large
data set. This paper presents an incremental algorithm to cluster the
categorical data. Frequencies of attribute values contribute much in
clustering similar categorical objects. In this paper we propose new
similarity measures based on the frequencies of attribute values and
its cardinalities. The proposed measures and the algorithm are
experimented with the data sets from UCI data repository. Results
prove that the proposed method generates better clusters than the
existing one.
Abstract: Iris-based biometric authentication is gaining importance
in recent times. Iris biometric processing however, is a complex
process and computationally very expensive. In the overall processing
of iris biometric in an iris-based biometric authentication system,
feature processing is an important task. In feature processing, we extract
iris features, which are ultimately used in matching. Since there
is a large number of iris features and computational time increases
as the number of features increases, it is therefore a challenge to
develop an iris processing system with as few as possible number of
features and at the same time without compromising the correctness.
In this paper, we address this issue and present an approach to feature
extraction and feature matching process. We apply Daubechies D4
wavelet with 4 levels to extract features from iris images. These
features are encoded with 2 bits by quantizing into 4 quantization
levels. With our proposed approach it is possible to represent an
iris template with only 304 bits, whereas existing approaches require
as many as 1024 bits. In addition, we assign different weights to
different iris region to compare two iris templates which significantly
increases the accuracy. Further, we match the iris template based on
a weighted similarity measure. Experimental results on several iris
databases substantiate the efficacy of our approach.
Abstract: In this paper, we propose effective system for digital music retrieval. We divided proposed system into Client and Server. Client part consists of pre-processing and Content-based feature extraction stages. In pre-processing stage, we minimized Time code Gap that is occurred among same music contents. As content-based feature, first-order differentiated MFCC were used. These presented approximately envelop of music feature sequences. Server part included Music Server and Music Matching stage. Extracted features from 1,000 digital music files were stored in Music Server. In Music Matching stage, we found retrieval result through similarity measure by DTW. In experiment, we used 450 queries. These were made by mixing different compression standards and sound qualities from 50 digital music files. Retrieval accurate indicated 97% and retrieval time was average 15ms in every single query. Out experiment proved that proposed system is effective in retrieve digital music and robust at various user environments of web.
Abstract: This paper proposes an algorithm which automatically aligns and stitches the component medical images (fluoroscopic) with varying degrees of overlap into a single composite image. The alignment method is based on similarity measure between the component images. As applied here the technique is intensity based rather than feature based. It works well in domains where feature based methods have difficulty, yet more robust than traditional correlation. Component images are stitched together using the new triangular averaging based blending algorithm. The quality of the resultant image is tested for photometric inconsistencies and geometric misalignments. This method cannot correct rotational, scale and perspective artifacts.
Abstract: This paper is concerned with the production of an Arabic word semantic similarity benchmark dataset. It is the first of its kind for Arabic which was particularly developed to assess the accuracy of word semantic similarity measurements. Semantic similarity is an essential component to numerous applications in fields such as natural language processing, artificial intelligence, linguistics, and psychology. Most of the reported work has been done for English. To the best of our knowledge, there is no word similarity measure developed specifically for Arabic. In this paper, an Arabic benchmark dataset of 70 word pairs is presented. New methods and best possible available techniques have been used in this study to produce the Arabic dataset. This includes selecting and creating materials, collecting human ratings from a representative sample of participants, and calculating the overall ratings. This dataset will make a substantial contribution to future work in the field of Arabic WSS and hopefully it will be considered as a reference basis from which to evaluate and compare different methodologies in the field.
Abstract: Most of the biclustering/projected clustering algorithms are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in many applications, like gene expression data and word-document data, non linear relationships may exist between the objects. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we improve upon our previous algorithm that uses mutual information for biclustering in terms of computation time and also the type of clusters identified. The algorithm is able to find biclusters with mixed relationships and is faster than the previous one. To the best of our knowledge, none of the other existing algorithms for biclustering have used mutual information as a similarity measure. We present the experimental results on synthetic data as well as on the yeast expression data. Biclusters on the yeast data were found to be biologically and statistically significant using GO Tool Box and FuncAssociate.
Abstract: New graph similarity methods have been proposed in this work with the aim to refining the chemical information extracted from molecules matching. For this purpose, data fusion of the isomorphic and nonisomorphic subgraphs into a new similarity measure, the Approximate Similarity, was carried out by several approaches. The application of the proposed method to the development of quantitative structure-activity relationships (QSAR) has provided reliable tools for predicting several pharmacological parameters: binding of steroids to the globulin-corticosteroid receptor, the activity of benzodiazepine receptor compounds, and the blood brain barrier permeability. Acceptable results were obtained for the models presented here.
Abstract: Functioning of a biometric system in large part
depends on the performance of the similarity measure function.
Frequently a generalized similarity distance measure function such as
Euclidian distance or Mahalanobis distance is applied to the task of
matching biometric feature vectors. However, often accuracy of a
biometric system can be greatly improved by designing a customized
matching algorithm optimized for a particular biometric application.
In this paper we propose a tailored similarity measure function for
behavioral biometric systems based on the expert knowledge of the
feature level data in the domain. We compare performance of a
proposed matching algorithm to that of other well known similarity
distance functions and demonstrate its superiority with respect to the
chosen domain.
Abstract: Lacking an inherent “natural" dissimilarity measure
between objects in categorical dataset presents special difficulties in
clustering analysis. However, each categorical attributes from a given
dataset provides natural probability and information in the sense of
Shannon. In this paper, we proposed a novel method which
heuristically converts categorical attributes to numerical values by
exploiting such associated information. We conduct an experimental
study with real-life categorical dataset. The experiment demonstrates
the effectiveness of our approach.
Abstract: Methods for organizing web data into groups in order
to analyze web-based hypertext data and facilitate data availability
are very important in terms of the number of documents available
online. Thereby, the task of clustering web-based document structures
has many applications, e.g., improving information retrieval on the
web, better understanding of user navigation behavior, improving web
users requests servicing, and increasing web information accessibility.
In this paper we investigate a new approach for clustering web-based
hypertexts on the basis of their graph structures. The hypertexts will
be represented as so called generalized trees which are more general
than usual directed rooted trees, e.g., DOM-Trees. As a important
preprocessing step we measure the structural similarity between the
generalized trees on the basis of a similarity measure d. Then,
we apply agglomerative clustering to the obtained similarity matrix
in order to create clusters of hypertext graph patterns representing
navigation structures. In the present paper we will run our approach
on a data set of hypertext structures and obtain good results in
Web Structure Mining. Furthermore we outline the application of
our approach in Web Usage Mining as future work.
Abstract: Cluster analysis divides data into groups that are
meaningful, useful, or both. Analysis of biological data is creating a
new generation of epidemiologic, prognostic, diagnostic and
treatment modalities. Clustering of protein sequences is one of the
current research topics in the field of computer science. Linear
relation is valuable in rule discovery for a given data, such as if value
X goes up 1, value Y will go down 3", etc. The classical linear
regression models the linear relation of two sequences perfectly.
However, if we need to cluster a large repository of protein sequences
into groups where sequences have strong linear relationship with
each other, it is prohibitively expensive to compare sequences one by
one. In this paper, we propose a new technique named General
Regression Model Technique Clustering Algorithm (GRMTCA) to
benignly handle the problem of linear sequences clustering. GRMT
gives a measure, GR*, to tell the degree of linearity of multiple
sequences without having to compare each pair of them.
Abstract: In this paper we study different similarity based approaches for the development of QSAR model devoted to the prediction of activity of antiobesity drugs. Classical similarity approaches are compared regarding to dissimilarity models based on the consideration of the calculation of Euclidean distances between the nonisomorphic fragments extracted in the matching process. Combining the classical similarity and dissimilarity approaches into a new similarity measure, the Approximate Similarity was also studied, and better results were obtained. The application of the proposed method to the development of quantitative structure-activity relationships (QSAR) has provided reliable tools for predicting of inhibitory activity of drugs. Acceptable results were obtained for the models presented here.