Abstract: In the field of concepts, the measure of Wu and Palmer [1] has the advantage of being simple to implement and have good performances compared to the other similarity measures [2]. Nevertheless, the Wu and Palmer measure present the following disadvantage: in some situations, the similarity of two elements of an IS-A ontology contained in the neighborhood exceeds the similarity value of two elements contained in the same hierarchy. This situation is inadequate within the information retrieval framework. To overcome this problem, we propose a new similarity measure based on the Wu and Palmer measure. Our objective is to obtain realistic results for concepts not located in the same way. The obtained results show that compared to the Wu and Palmer approach, our measure presents a profit in terms of relevance and execution time.
Abstract: This paper proposes new hybrid approaches for face
recognition. Gabor wavelets representation of face images is an
effective approach for both facial action recognition and face
identification. Perform dimensionality reduction and linear
discriminate analysis on the down sampled Gabor wavelet faces can
increase the discriminate ability. Nearest feature space is extended to
various similarity measures. In our experiments, proposed Gabor
wavelet faces combined with extended neural net feature space
classifier shows very good performance, which can achieve 93 %
maximum correct recognition rate on ORL data set without any preprocessing
step.
Abstract: The similarity comparison of RNA secondary
structures is important in studying the functions of RNAs. In recent
years, most existing tools represent the secondary structures by
tree-based presentation and calculate the similarity by tree alignment
distance. Different to previous approaches, we propose a new method
based on maximum clique detection algorithm to extract the maximum
common structural elements in compared RNA secondary structures.
A new graph-based similarity measurement and maximum common
subgraph detection procedures for comparing purely RNA secondary
structures is introduced. Given two RNA secondary structures, the
proposed algorithm consists of a process to determine the score of the
structural similarity, followed by comparing vertices labelling, the
labelled edges and the exact degree of each vertex. The proposed
algorithm also consists of a process to extract the common structural
elements between compared secondary structures based on a proposed
maximum clique detection of the problem. This graph-based model
also can work with NC-IUB code to perform the pattern-based
searching. Therefore, it can be used to identify functional RNA motifs
from database or to extract common substructures between complex
RNA secondary structures. We have proved the performance of this
proposed algorithm by experimental results. It provides a new idea of
comparing RNA secondary structures. This tool is helpful to those
who are interested in structural bioinformatics.
Abstract: This paper introduces a measure of similarity between
two clusterings of the same dataset produced by two different
algorithms, or even the same algorithm (K-means, for instance, with
different initializations usually produce different results in clustering
the same dataset). We then apply the measure to calculate the
similarity between pairs of clusterings, with special interest directed
at comparing the similarity between various machine clusterings and
human clustering of datasets. The similarity measure thus can be used
to identify the best (in terms of most similar to human) clustering
algorithm for a specific problem at hand. Experimental results
pertaining to the text categorization problem of a Portuguese corpus
(wherein a translation-into-English approach is used) are presented, as well as results on the well-known benchmark IRIS dataset. The
significance and other potential applications of the proposed measure
are discussed.
Abstract: Detecting protein-protein interactions is a central problem in computational biology and aberrant such interactions may have implicated in a number of neurological disorders. As a result, the prediction of protein-protein interactions has recently received considerable attention from biologist around the globe. Computational tools that are capable of effectively identifying protein-protein interactions are much needed. In this paper, we propose a method to detect protein-protein interaction based on substring similarity measure. Two protein sequences may interact by the mean of the similarities of the substrings they contain. When applied on the currently available protein-protein interaction data for the yeast Saccharomyces cerevisiae, the proposed method delivered reasonable improvement over the existing ones.
Abstract: K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.
Abstract: Discovering new biological knowledge from the highthroughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a new approach for protein classification. Proteins that are evolutionarily- and thereby functionally- related are said to belong to the same classification. Identifying protein classification is of fundamental importance to document the diversity of the known protein universe. It also provides a means to determine the functional roles of newly discovered protein sequences. Our goal is to predict the functional classification of novel protein sequences based on a set of features extracted from each protein sequence. The proposed technique used datasets extracted from the Structural Classification of Proteins (SCOP) database. A set of spectral domain features based on Fast Fourier Transform (FFT) is used. The proposed classifier uses multilayer back propagation (MLBP) neural network for protein classification. The maximum classification accuracy is about 91% when applying the classifier to the full four levels of the SCOP database. However, it reaches a maximum of 96% when limiting the classification to the family level. The classification results reveal that spectral domain contains information that can be used for classification with high accuracy. In addition, the results emphasize that sequence similarity measures are of great importance especially at the family level.
Abstract: In text categorization problem the most used method
for documents representation is based on words frequency vectors
called VSM (Vector Space Model). This representation is based only
on words from documents and in this case loses any “word context"
information found in the document. In this article we make a
comparison between the classical method of document representation
and a method called Suffix Tree Document Model (STDM) that is
based on representing documents in the Suffix Tree format. For the
STDM model we proposed a new approach for documents
representation and a new formula for computing the similarity
between two documents. Thus we propose to build the suffix tree
only for any two documents at a time. This approach is faster, it has
lower memory consumption and use entire document representation
without using methods for disposing nodes. Also for this method is
proposed a formula for computing the similarity between documents,
which improves substantially the clustering quality. This
representation method was validated using HAC - Hierarchical
Agglomerative Clustering. In this context we experiment also the
stemming influence in the document preprocessing step and highlight
the difference between similarity or dissimilarity measures to find
“closer" documents.
Abstract: The P-Bigram method is a string comparison methods
base on an internal two characters-based similarity measure. The edit
distance between two strings is the minimal number of elementary
editing operations required to transform one string into the other. The
elementary editing operations include deletion, insertion, substitution
two characters. In this paper, we address the P-Bigram method to
sole the similarity problem in DNA sequence. This method provided
an efficient algorithm that locates all minimum operation in a string.
We have been implemented algorithm and found that our program
calculated that smaller distance than one string. We develop PBigram
edit distance and show that edit distance or the similarity and
implementation using dynamic programming. The performance of
the proposed approach is evaluated using number edit and percentage
similarity measures.
Abstract: Skin color is an important visual cue for computer
vision systems involving human users. In this paper we combine skin
color and optical flow for detection and tracking of skin regions. We
apply these techniques to gesture recognition with encouraging
results. We propose a novel skin similarity measure. For grouping
detected skin regions we propose a novel skin region grouping
mechanism. The proposed techniques work with any number of skin
regions making them suitable for a multiuser scenario.
Abstract: This paper describes a novel and effective approach to content-based image retrieval (CBIR) that represents each image in the database by a vector of feature values called “Standard deviation of mean vectors of color distribution of rows and columns of images for CBIR". In many areas of commerce, government, academia, and hospitals, large collections of digital images are being created. This paper describes the approach that uses contents as feature vector for retrieval of similar images. There are several classes of features that are used to specify queries: colour, texture, shape, spatial layout. Colour features are often easily obtained directly from the pixel intensities. In this paper feature extraction is done for the texture descriptor that is 'variance' and 'Variance of Variances'. First standard deviation of each row and column mean is calculated for R, G, and B planes. These six values are obtained for one image which acts as a feature vector. Secondly we calculate variance of the row and column of R, G and B planes of an image. Then six standard deviations of these variance sequences are calculated to form a feature vector of dimension six. We applied our approach to a database of 300 BMP images. We have determined the capability of automatic indexing by analyzing image content: color and texture as features and by applying a similarity measure Euclidean distance.
Abstract: The paper proposes an approach using genetic algorithm for computing the region based image similarity. The image is denoted using a set of segmented regions reflecting color and texture properties of an image. An image is associated with a family of image features corresponding to the regions. The resemblance of two images is then defined as the overall similarity between two families of image features, and quantified by a similarity measure, which integrates properties of all the regions in the images. A genetic algorithm is applied to decide the most plausible matching. The performance of the proposed method is illustrated using examples from an image database of general-purpose images, and is shown to produce good results.
Abstract: This paper investigates the issue of building decision
trees from data with imprecise class values where imprecision is
encoded in the form of possibility distributions. The Information
Affinity similarity measure is introduced into the well-known gain
ratio criterion in order to assess the homogeneity of a set of
possibility distributions representing instances-s classes belonging to
a given training partition. For the experimental study, we proposed an
information affinity based performance criterion which we have used
in order to show the performance of the approach on well-known
benchmarks.
Abstract: The primary objective of the paper is to propose a new method for solving assignment problem under uncertain situation. In the classical assignment problem (AP), zpqdenotes the cost for assigning the qth job to the pth person which is deterministic in nature. Here in some uncertain situation, we have assigned a cost in the form of composite relative degree Fpq instead of and this replaced cost is in the maximization form. In this paper, it has been solved and validated by the two proposed algorithms, a new mathematical formulation of IVIF assignment problem has been presented where the cost has been considered to be an IVIFN and the membership of elements in the set can be explained by positive and negative evidences. To determine the composite relative degree of similarity of IVIFS the concept of similarity measure and the score function is used for validating the solution which is obtained by Composite relative similarity degree method. Further, hypothetical numeric illusion is conducted to clarify the method’s effectiveness and feasibility developed in the study. Finally, conclusion and suggestion for future work are also proposed.
Abstract: In this paper we study the fuzzy c-mean clustering algorithm
combined with principal components method. Demonstratively
analysis indicate that the new clustering method is well rather than
some clustering algorithms. We also consider the validity of clustering
method.
Abstract: The literature reports a large number of approaches for
measuring the similarity between protein sequences. Most of these
approaches estimate this similarity using alignment-based techniques
that do not necessarily yield biologically plausible results, for two
reasons.
First, for the case of non-alignable (i.e., not yet definitively aligned
and biologically approved) sequences such as multi-domain, circular
permutation and tandem repeat protein sequences, alignment-based
approaches do not succeed in producing biologically plausible results.
This is due to the nature of the alignment, which is based on the
matching of subsequences in equivalent positions, while non-alignable
proteins often have similar and conserved domains in non-equivalent
positions.
Second, the alignment-based approaches lead to similarity measures
that depend heavily on the parameters set by the user for the alignment
(e.g., gap penalties and substitution matrices). For easily alignable
protein sequences, it's possible to supply a suitable combination of
input parameters that allows such an approach to yield biologically
plausible results. However, for difficult-to-align protein sequences,
supplying different combinations of input parameters yields different
results. Such variable results create ambiguities and complicate the
similarity measurement task.
To overcome these drawbacks, this paper describes a novel and
effective approach for measuring the similarity between protein
sequences, called SAF for Substitution and Alignment Free. Without
resorting either to the alignment of protein sequences or to substitution
relations between amino acids, SAF is able to efficiently detect the
significant subsequences that best represent the intrinsic properties of
protein sequences, those underlying the chronological dependencies of
structural features and biochemical activities of protein sequences.
Moreover, by using a new efficient subsequence matching scheme,
SAF more efficiently handles protein sequences that contain similar
structural features with significant meaning in chronologically
non-equivalent positions. To show the effectiveness of SAF, extensive
experiments were performed on protein datasets from different
databases, and the results were compared with those obtained by
several mainstream algorithms.
Abstract: The paper presents an on-line recognition machine
(RM) for continuous/isolated, dynamic and static gestures that arise
in Flight Deck Officer (FDO) training. RM is based on generic pattern
recognition framework. Gestures are represented as templates using
summary statistics. The proposed recognition algorithm exploits temporal
and spatial characteristics of gestures via dynamic programming
and Markovian process. The algorithm predicts corresponding index
of incremental input data in the templates in an on-line mode.
Accumulated consistency in the sequence of prediction provides a
similarity measurement (Score) between input data and the templates.
The algorithm provides an intuitive mechanism for automatic detection
of start/end frames of continuous gestures. In the present paper,
we consider isolated gestures. The performance of RM is evaluated
using four datasets - artificial (W TTest), hand motion (Yang) and
FDO (tracker, vision-based ). RM achieves comparable results which
are in agreement with other on-line and off-line algorithms such as
hidden Markov model (HMM) and dynamic time warping (DTW).
The proposed algorithm has the additional advantage of providing
timely feedback for training purposes.
Abstract: We present a method to create special domain
collections from news sites. The method only requires a single
sample article as a seed. No prior corpus statistics are needed and the
method is applicable to multiple languages. We examine various
similarity measures and the creation of document collections for
English and Japanese. The main contributions are as follows. First,
the algorithm can build special domain collections from as little as
one sample document. Second, unlike other algorithms it does not
require a second “general" corpus to compute statistics. Third, in our
testing the algorithm outperformed others in creating collections
made up of highly relevant articles.
Abstract: Most of fuzzy clustering algorithms have some
discrepancies, e.g. they are not able to detect clusters with convex
shapes, the number of the clusters should be a priori known, they
suffer from numerical problems, like sensitiveness to the
initialization, etc. This paper studies the synergistic combination of
the hierarchical and graph theoretic minimal spanning tree based
clustering algorithm with the partitional Gath-Geva fuzzy clustering
algorithm. The aim of this hybridization is to increase the robustness
and consistency of the clustering results and to decrease the number
of the heuristically defined parameters of these algorithms to
decrease the influence of the user on the clustering results. For the
analysis of the resulted fuzzy clusters a new fuzzy similarity measure
based tool has been presented. The calculated similarities of the
clusters can be used for the hierarchical clustering of the resulted
fuzzy clusters, which information is useful for cluster merging and
for the visualization of the clustering results. As the examples used
for the illustration of the operation of the new algorithm will show,
the proposed algorithm can detect clusters from data with arbitrary
shape and does not suffer from the numerical problems of the
classical Gath-Geva fuzzy clustering algorithm.
Abstract: Intuitionistic fuzzy sets as proposed by Atanassov,
have gained much attention from past and latter researchers for
applications in various fields. Similarity measures between
intuitionistic fuzzy sets were developed afterwards. However, it does
not cater the conflicting behavior of each element evaluated. We
therefore made some modification to the similarity measure of IFS
by considering conflicting concept to the model. In this paper, we
concentrate on Zhang and Fu-s similarity measures for IFSs and
some examples are given to validate these similarity measures. A
simple modification to Zhang and Fu-s similarity measures of IFSs
was proposed to find the best result according to the use of degree of
indeterminacy. Finally, we mark up with the application to real
decision making problems.