Abstract: Port authorities have many challenges in congested ports to allocate their resources to provide a safe and secure loading/unloading procedure for cargo vessels. Selecting a destination port is the decision of a vessel master based on many factors such as weather, wavelength and changes of priorities. Having access to a tool which leverages Automatic Identification System (AIS) messages to monitor vessel’s movements and accurately predict their next destination port promotes an effective resource allocation process for port authorities. In this research, we propose a method, namely, Reference Route of Trajectory (RRoT) to assist port authorities in predicting inflow and outflow traffic in their local environment by monitoring AIS messages. Our RRo method creates a reference route based on historical AIS messages. It utilizes some of the best trajectory similarity measures to identify the destination of a vessel using their recent movement. We evaluated five different similarity measures such as Discrete Frechet Distance (DFD), Dynamic Time ´ Warping (DTW), Partial Curve Mapping (PCM), Area between two curves (Area) and Curve length (CL). Our experiments show that our method identifies the destination port with an accuracy of 98.97% and an f-measure of 99.08% using Dynamic Time Warping (DTW) similarity measure.
Abstract: Vehicle Routing Problem (VRP) is a complex combinatorial optimization problem and it is quite difficult to find an optimal solution consisting of a set of routes for vehicles whose total cost is minimum. Evolutionary and swarm intelligent (SI) algorithms play a vital role in solving optimization problems. While the SI algorithms perform search, the diversity between the solutions they exploit is very important. This is because of the need to avoid early convergence and to get an appropriate balance between the exploration and exploitation. Therefore, it is important to check how far the solutions are diverse. In this paper, we measure the similarity between solutions, which ABC exploits while optimizing VRP. The similar solutions found are discarded at the end of the iteration and only unique solutions are passed on to the next iteration. The bees of discarded solutions become scouts and they start searching for new solutions. This process is continued and results show that the solution is optimized at lesser number of iterations but with the overhead of computing similarity in all the iterations. The problem instance from Solomon benchmarked dataset has been used for evaluating the presented methodology.
Abstract: Nowadays, ontologies are used for achieving a
common understanding within a user community and for sharing
domain knowledge. However, the de-centralized nature of the web
makes indeed inevitable that small communities will use their own
ontologies to describe their data and to index their own resources.
Certainly, accessing to resources from various ontologies created
independently is an important challenge for answering end user
queries. Ontology mapping is thus required for combining ontologies.
However, mapping complete ontologies at run time is a
computationally expensive task. This paper proposes a system in
which mappings between concepts may be generated dynamically as
the concepts are encountered during user queries. In this way, the
interaction itself defines the context in which small and relevant
portions of ontologies are mapped. We illustrate application of the
proposed system in the context of Technology Enhanced Learning
(TEL) where learners need to access to learning resources covering
specific concepts.
Abstract: In this work, we begin with the presentation of the
Tθ family of usual similarity measures concerning multidimensional
binary data. Subsequently, some properties of these measures are
proposed. Finally the impact of the use of different inter-elements
measures on the results of the Agglomerative Hierarchical Clustering
Methods is studied.
Abstract: The process in which the complementary information from multiple images is integrated to provide composite image that contains more information than the original input images is called image fusion. Medical image fusion provides useful information from multimodality medical images that provides additional information to the doctor for diagnosis of diseases in a better way. This paper represents the wavelet based medical image fusion algorithm on different multimodality medical images. In order to fuse the medical images, images are decomposed using Redundant Wavelet Transform (RWT). The high frequency coefficients are convolved with morphological operator followed by the maximum-selection (MS) rule. The low frequency coefficients are processed by MS rule. The reconstructed image is obtained by inverse RWT. The quantitative measures which includes Mean, Standard Deviation, Average Gradient, Spatial frequency, Edge based Similarity Measures are considered for evaluating the fused images. The performance of this proposed method is compared with Pixel averaging, PCA, and DWT fusion methods. When compared with conventional methods, the proposed framework provides better performance for analysis of multimodality medical images.
Abstract: Clustering categorical data is more complicated than
the numerical clustering because of its special properties. Scalability
and memory constraint is the challenging problem in clustering large
data set. This paper presents an incremental algorithm to cluster the
categorical data. Frequencies of attribute values contribute much in
clustering similar categorical objects. In this paper we propose new
similarity measures based on the frequencies of attribute values and
its cardinalities. The proposed measures and the algorithm are
experimented with the data sets from UCI data repository. Results
prove that the proposed method generates better clusters than the
existing one.
Abstract: In the field of concepts, the measure of Wu and Palmer [1] has the advantage of being simple to implement and have good performances compared to the other similarity measures [2]. Nevertheless, the Wu and Palmer measure present the following disadvantage: in some situations, the similarity of two elements of an IS-A ontology contained in the neighborhood exceeds the similarity value of two elements contained in the same hierarchy. This situation is inadequate within the information retrieval framework. To overcome this problem, we propose a new similarity measure based on the Wu and Palmer measure. Our objective is to obtain realistic results for concepts not located in the same way. The obtained results show that compared to the Wu and Palmer approach, our measure presents a profit in terms of relevance and execution time.
Abstract: This paper proposes new hybrid approaches for face
recognition. Gabor wavelets representation of face images is an
effective approach for both facial action recognition and face
identification. Perform dimensionality reduction and linear
discriminate analysis on the down sampled Gabor wavelet faces can
increase the discriminate ability. Nearest feature space is extended to
various similarity measures. In our experiments, proposed Gabor
wavelet faces combined with extended neural net feature space
classifier shows very good performance, which can achieve 93 %
maximum correct recognition rate on ORL data set without any preprocessing
step.
Abstract: This paper introduces a measure of similarity between
two clusterings of the same dataset produced by two different
algorithms, or even the same algorithm (K-means, for instance, with
different initializations usually produce different results in clustering
the same dataset). We then apply the measure to calculate the
similarity between pairs of clusterings, with special interest directed
at comparing the similarity between various machine clusterings and
human clustering of datasets. The similarity measure thus can be used
to identify the best (in terms of most similar to human) clustering
algorithm for a specific problem at hand. Experimental results
pertaining to the text categorization problem of a Portuguese corpus
(wherein a translation-into-English approach is used) are presented, as well as results on the well-known benchmark IRIS dataset. The
significance and other potential applications of the proposed measure
are discussed.
Abstract: Discovering new biological knowledge from the highthroughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a new approach for protein classification. Proteins that are evolutionarily- and thereby functionally- related are said to belong to the same classification. Identifying protein classification is of fundamental importance to document the diversity of the known protein universe. It also provides a means to determine the functional roles of newly discovered protein sequences. Our goal is to predict the functional classification of novel protein sequences based on a set of features extracted from each protein sequence. The proposed technique used datasets extracted from the Structural Classification of Proteins (SCOP) database. A set of spectral domain features based on Fast Fourier Transform (FFT) is used. The proposed classifier uses multilayer back propagation (MLBP) neural network for protein classification. The maximum classification accuracy is about 91% when applying the classifier to the full four levels of the SCOP database. However, it reaches a maximum of 96% when limiting the classification to the family level. The classification results reveal that spectral domain contains information that can be used for classification with high accuracy. In addition, the results emphasize that sequence similarity measures are of great importance especially at the family level.
Abstract: In text categorization problem the most used method
for documents representation is based on words frequency vectors
called VSM (Vector Space Model). This representation is based only
on words from documents and in this case loses any “word context"
information found in the document. In this article we make a
comparison between the classical method of document representation
and a method called Suffix Tree Document Model (STDM) that is
based on representing documents in the Suffix Tree format. For the
STDM model we proposed a new approach for documents
representation and a new formula for computing the similarity
between two documents. Thus we propose to build the suffix tree
only for any two documents at a time. This approach is faster, it has
lower memory consumption and use entire document representation
without using methods for disposing nodes. Also for this method is
proposed a formula for computing the similarity between documents,
which improves substantially the clustering quality. This
representation method was validated using HAC - Hierarchical
Agglomerative Clustering. In this context we experiment also the
stemming influence in the document preprocessing step and highlight
the difference between similarity or dissimilarity measures to find
“closer" documents.
Abstract: The P-Bigram method is a string comparison methods
base on an internal two characters-based similarity measure. The edit
distance between two strings is the minimal number of elementary
editing operations required to transform one string into the other. The
elementary editing operations include deletion, insertion, substitution
two characters. In this paper, we address the P-Bigram method to
sole the similarity problem in DNA sequence. This method provided
an efficient algorithm that locates all minimum operation in a string.
We have been implemented algorithm and found that our program
calculated that smaller distance than one string. We develop PBigram
edit distance and show that edit distance or the similarity and
implementation using dynamic programming. The performance of
the proposed approach is evaluated using number edit and percentage
similarity measures.
Abstract: The primary objective of the paper is to propose a new method for solving assignment problem under uncertain situation. In the classical assignment problem (AP), zpqdenotes the cost for assigning the qth job to the pth person which is deterministic in nature. Here in some uncertain situation, we have assigned a cost in the form of composite relative degree Fpq instead of and this replaced cost is in the maximization form. In this paper, it has been solved and validated by the two proposed algorithms, a new mathematical formulation of IVIF assignment problem has been presented where the cost has been considered to be an IVIFN and the membership of elements in the set can be explained by positive and negative evidences. To determine the composite relative degree of similarity of IVIFS the concept of similarity measure and the score function is used for validating the solution which is obtained by Composite relative similarity degree method. Further, hypothetical numeric illusion is conducted to clarify the method’s effectiveness and feasibility developed in the study. Finally, conclusion and suggestion for future work are also proposed.
Abstract: In this paper we study the fuzzy c-mean clustering algorithm
combined with principal components method. Demonstratively
analysis indicate that the new clustering method is well rather than
some clustering algorithms. We also consider the validity of clustering
method.
Abstract: The literature reports a large number of approaches for
measuring the similarity between protein sequences. Most of these
approaches estimate this similarity using alignment-based techniques
that do not necessarily yield biologically plausible results, for two
reasons.
First, for the case of non-alignable (i.e., not yet definitively aligned
and biologically approved) sequences such as multi-domain, circular
permutation and tandem repeat protein sequences, alignment-based
approaches do not succeed in producing biologically plausible results.
This is due to the nature of the alignment, which is based on the
matching of subsequences in equivalent positions, while non-alignable
proteins often have similar and conserved domains in non-equivalent
positions.
Second, the alignment-based approaches lead to similarity measures
that depend heavily on the parameters set by the user for the alignment
(e.g., gap penalties and substitution matrices). For easily alignable
protein sequences, it's possible to supply a suitable combination of
input parameters that allows such an approach to yield biologically
plausible results. However, for difficult-to-align protein sequences,
supplying different combinations of input parameters yields different
results. Such variable results create ambiguities and complicate the
similarity measurement task.
To overcome these drawbacks, this paper describes a novel and
effective approach for measuring the similarity between protein
sequences, called SAF for Substitution and Alignment Free. Without
resorting either to the alignment of protein sequences or to substitution
relations between amino acids, SAF is able to efficiently detect the
significant subsequences that best represent the intrinsic properties of
protein sequences, those underlying the chronological dependencies of
structural features and biochemical activities of protein sequences.
Moreover, by using a new efficient subsequence matching scheme,
SAF more efficiently handles protein sequences that contain similar
structural features with significant meaning in chronologically
non-equivalent positions. To show the effectiveness of SAF, extensive
experiments were performed on protein datasets from different
databases, and the results were compared with those obtained by
several mainstream algorithms.
Abstract: We present a method to create special domain
collections from news sites. The method only requires a single
sample article as a seed. No prior corpus statistics are needed and the
method is applicable to multiple languages. We examine various
similarity measures and the creation of document collections for
English and Japanese. The main contributions are as follows. First,
the algorithm can build special domain collections from as little as
one sample document. Second, unlike other algorithms it does not
require a second “general" corpus to compute statistics. Third, in our
testing the algorithm outperformed others in creating collections
made up of highly relevant articles.
Abstract: Intuitionistic fuzzy sets as proposed by Atanassov,
have gained much attention from past and latter researchers for
applications in various fields. Similarity measures between
intuitionistic fuzzy sets were developed afterwards. However, it does
not cater the conflicting behavior of each element evaluated. We
therefore made some modification to the similarity measure of IFS
by considering conflicting concept to the model. In this paper, we
concentrate on Zhang and Fu-s similarity measures for IFSs and
some examples are given to validate these similarity measures. A
simple modification to Zhang and Fu-s similarity measures of IFSs
was proposed to find the best result according to the use of degree of
indeterminacy. Finally, we mark up with the application to real
decision making problems.