Abstract: Classification is an important topic in machine learning
and bioinformatics. Many datasets have been introduced for
classification tasks. A dataset contains multiple features, and the quality of features influences the classification accuracy of the dataset.
The power of classification for each feature differs. In this study, we
suggest the Classification Influence Index (CII) as an indicator of classification power for each feature. CII enables evaluation of the
features in a dataset and improved classification accuracy by transformation of the dataset. By conducting experiments using CII
and the k-nearest neighbor classifier to analyze real datasets, we confirmed that the proposed index provided meaningful improvement
of the classification accuracy.
Abstract: The method of gait identification based on the nearest neighbor classification technique with motion similarity assessment by the dynamic time warping is proposed. The model based kinematic motion data, represented by the joints rotations coded by Euler angles and unit quaternions is used. The different pose distance functions in Euler angles and quaternion spaces are considered. To evaluate individual features of the subsequent joints movements during gait cycle, joint selection is carried out. To examine proposed approach database containing 353 gaits of 25 humans collected in motion capture laboratory is used. The obtained results are promising. The classifications, which takes into consideration all joints has accuracy over 91%. Only analysis of movements of hip joints allows to correctly identify gaits with almost 80% precision.
Abstract: Classification of Persian printed numeral characters
has been considered and a proposed system has been introduced. In
representation stage, for the first time in Persian optical character
recognition, extended moment invariants has been utilized as
characters image descriptor. In classification stage, four different
classifiers namely minimum mean distance, nearest neighbor rule,
multi layer perceptron, and fuzzy min-max neural network has been
used, which first and second are traditional nonparametric statistical
classifier. Third is a well-known neural network and forth is a kind of
fuzzy neural network that is based on utilizing hyperbox fuzzy sets.
Set of different experiments has been done and variety of results has
been presented. The results showed that extended moment invariants
are qualified as features to classify Persian printed numeral
characters.
Abstract: During the process of compaction in Hot-Mix Asphalt
(HMA) mixtures, the distance between aggregate particles decreases
as they come together and eliminate air-voids. By measuring the
inter-particle distances in a cut-section of a HMA sample the degree
of compaction can be estimated. For this, a calibration curve is
generated by computer simulation technique when the gradation and
asphalt content of the HMA mixture are known. A two-dimensional
cross section of HMA specimen was simulated using the mixture
design information (gradation, asphalt content and air-void content).
Nearest neighbor distance methods such as Delaunay triangulation
were used to study the changes in inter-particle distance and area
distribution during the process of compaction in HMA. Such
computer simulations would enable making several hundreds of
repetitions in a short period of time without the necessity to compact
and analyze laboratory specimens in order to obtain good statistics on
the parameters defined. The distributions for the statistical
parameters based on computer simulations showed similar trends as
those of laboratory specimens.
Abstract: Globalization and therefore increasing tight competition among companies, have resulted to increase the importance of making well-timed decision. Devising and employing effective strategies, that are flexible and adaptive to changing market, stand a greater chance of being effective in the long-term. In other side, a clear focus on managing the entire product lifecycle has emerged as critical areas for investment. Therefore, applying wellorganized tools to employ past experience in new case, helps to make proper and managerial decisions. Case based reasoning (CBR) is based on a means of solving a new problem by using or adapting solutions to old problems. In this paper, an adapted CBR model with k-nearest neighbor (K-NN) is employed to provide suggestions for better decision making which are adopted for a given product in the middle of life phase. The set of solutions are weighted by CBR in the principle of group decision making. Wrapper approach of genetic algorithm is employed to generate optimal feature subsets. The dataset of the department store, including various products which are collected among two years, have been used. K-fold approach is used to evaluate the classification accuracy rate. Empirical results are compared with classical case based reasoning algorithm which has no special process for feature selection, CBR-PCA algorithm based on filter approach feature selection, and Artificial Neural Network. The results indicate that the predictive performance of the model, compare with two CBR algorithms, in specific case is more effective.
Abstract: Locality Sensitive Hashing (LSH) is one of the most
promising techniques for solving nearest neighbour search problem in
high dimensional space. Euclidean LSH is the most popular variation
of LSH that has been successfully applied in many multimedia
applications. However, the Euclidean LSH presents limitations that
affect structure and query performances. The main limitation of the
Euclidean LSH is the large memory consumption. In order to achieve
a good accuracy, a large number of hash tables is required. In this
paper, we propose a new hashing algorithm to overcome the storage
space problem and improve query time, while keeping a good
accuracy as similar to that achieved by the original Euclidean LSH.
The Experimental results on a real large-scale dataset show that the
proposed approach achieves good performances and consumes less
memory than the Euclidean LSH.
Abstract: Naive Bayes Nearest Neighbor (NBNN) and its variants, i,e., local NBNN and the NBNN kernels, are local feature-based classifiers that have achieved impressive performance in image classification. By exploiting instance-to-class (I2C) distances (instance means image/video in image/video classification), they avoid quantization errors of local image descriptors in the bag of words (BoW) model. However, the performances of NBNN, local NBNN and the NBNN kernels have not been validated on video analysis. In this paper, we introduce these three classifiers into human action recognition and conduct comprehensive experiments on the benchmark KTH and the realistic HMDB datasets. The results shows that those I2C based classifiers consistently outperform the SVM classifier with the BoW model.
Abstract: This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.
Abstract: Tumor classification is a key area of research in the
field of bioinformatics. Microarray technology is commonly used in
the study of disease diagnosis using gene expression levels. The
main drawback of gene expression data is that it contains thousands
of genes and a very few samples. Feature selection methods are used
to select the informative genes from the microarray. These methods
considerably improve the classification accuracy. In the proposed
method, Genetic Algorithm (GA) is used for effective feature
selection. Informative genes are identified based on the T-Statistics,
Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate
solutions of GA are obtained from top-m informative genes. The
classification accuracy of k-Nearest Neighbor (kNN) method is used
as the fitness function for GA. In this work, kNN and Support Vector
Machine (SVM) are used as the classifiers. The experimental results
show that the proposed work is suitable for effective feature
selection. With the help of the selected genes, GA-kNN method
achieves 100% accuracy in 4 datasets and GA-SVM method
achieves in 5 out of 10 datasets. The GA with kNN and SVM
methods are demonstrated to be an accurate method for microarray
based tumor classification.
Abstract: The objective of this paper, is to apply support vector machine (SVM) approach for the classification of cancerous and normal regions of prostate images. Three kinds of textural features are extracted and used for the analysis: parameters of the Gauss- Markov random field (GMRF), correlation function and relative entropy. Prostate images are acquired by the system consisting of a microscope, video camera and a digitizing board. Cross-validated classification over a database of 46 images is implemented to evaluate the performance. In SVM classification, sensitivity and specificity of 96.2% and 97.0% are achieved for the 32x32 pixel block sized data, respectively, with an overall accuracy of 96.6%. Classification performance is compared with artificial neural network and k-nearest neighbor classifiers. Experimental results demonstrate that the SVM approach gives the best performance.
Abstract: In this study, a high accuracy protein-protein interaction
prediction method is developed. The importance of the proposed
method is that it only uses sequence information of proteins while
predicting interaction. The method extracts phylogenetic profiles of
proteins by using their sequence information. Combining the phylogenetic
profiles of two proteins by checking existence of homologs
in different species and fitting this combined profile into a statistical
model, it is possible to make predictions about the interaction status
of two proteins.
For this purpose, we apply a collection of pattern recognition
techniques on the dataset of combined phylogenetic profiles of protein
pairs. Support Vector Machines, Feature Extraction using ReliefF,
Naive Bayes Classification, K-Nearest Neighborhood Classification,
Decision Trees, and Random Forest Classification are the methods
we applied for finding the classification method that best predicts
the interaction status of protein pairs. Random Forest Classification
outperformed all other methods with a prediction accuracy of 76.93%
Abstract: We present here the results for a comparative study of
some techniques, available in the literature, related to the relevance
feedback mechanism in the case of a short-term learning. Only one
method among those considered here is belonging to the data mining
field which is the K-nearest neighbors algorithm (KNN) while the
rest of the methods is related purely to the information retrieval field
and they fall under the purview of the following three major axes:
Shifting query, Feature Weighting and the optimization of the
parameters of similarity metric. As a contribution, and in addition to
the comparative purpose, we propose a new version of the KNN
algorithm referred to as an incremental KNN which is distinct from
the original version in the sense that besides the influence of the
seeds, the rate of the actual target image is influenced also by the
images already rated. The results presented here have been obtained
after experiments conducted on the Wang database for one iteration
and utilizing color moments on the RGB space. This compact
descriptor, Color Moments, is adequate for the efficiency purposes
needed in the case of interactive systems. The results obtained allow
us to claim that the proposed algorithm proves good results; it even
outperforms a wide range of techniques available in the literature.
Abstract: Single nucleotide polymorphisms (SNPs) hold much promise as a basis for disease-gene association. However, research is limited by the cost of genotyping the tremendous number of SNPs. Therefore, it is important to identify a small subset of informative SNPs, the so-called tag SNPs. This subset consists of selected SNPs of the genotypes, and accurately represents the rest of the SNPs. Furthermore, an effective evaluation method is needed to evaluate prediction accuracy of a set of tag SNPs. In this paper, a genetic algorithm (GA) is applied to tag SNP problems, and the K-nearest neighbor (K-NN) serves as a prediction method of tag SNP selection. The experimental data used was taken from the HapMap project; it consists of genotype data rather than haplotype data. The proposed method consistently identified tag SNPs with considerably better prediction accuracy than methods from the literature. At the same time, the number of tag SNPs identified was smaller than the number of tag SNPs in the other methods. The run time of the proposed method was much shorter than the run time of the SVM/STSA method when the same accuracy was reached.
Abstract: Serial Analysis of Gene Expression is a powerful
quantification technique for generating cell or tissue gene expression
data. The profile of the gene expression of cell or tissue in several
different states is difficult for biologists to analyze because of the large
number of genes typically involved. However, feature selection in
machine learning can successfully reduce this problem. The method
allows reducing the features (genes) in specific SAGE data, and
determines only relevant genes. In this study, we used a genetic
algorithm to implement feature selection, and evaluate the
classification accuracy of the selected features with the K-nearest
neighbor method. In order to validate the proposed method, we used
two SAGE data sets for testing. The results of this study conclusively
prove that the number of features of the original SAGE data set can be
significantly reduced and higher classification accuracy can be
achieved.
Abstract: Computerized lip reading has been one of the most
actively researched areas of computer vision in recent past because
of its crime fighting potential and invariance to acoustic environment.
However, several factors like fast speech, bad pronunciation,
poor illumination, movement of face, moustaches and beards make
lip reading difficult. In present work, we propose a solution for
automatic lip contour tracking and recognizing letters of English
language spoken by speakers using the information available from
lip movements. Level set method is used for tracking lip contour
using a contour velocity model and a feature vector of lip movements
is then obtained. Character recognition is performed using modified
k nearest neighbor algorithm which assigns more weight to nearer
neighbors. The proposed system has been found to have accuracy
of 73.3% for character recognition with speaker lip movements as
the only input and without using any speech recognition system in
parallel. The approach used in this work is found to significantly
solve the purpose of lip reading when size of database is small.
Abstract: By taking advantage of both k-NN which is highly
accurate and K-means cluster which is able to reduce the time of classification, we can introduce Cluster-k-Nearest Neighbor as "variable k"-NN dealing with the centroid or mean point of all subclasses generated by clustering algorithm. In general the algorithm of K-means cluster is not stable, in term of accuracy, for that reason we develop another algorithm for clustering our space which gives a higher accuracy than K-means cluster, less
subclass number, stability and bounded time of classification with respect to the variable data size. We find between 96% and 99.7 % of accuracy in the lassification of 6 different types of Time series by using K-means cluster algorithm and we find 99.7% by using the new clustering algorithm.
Abstract: In order to accelerate the similarity search in highdimensional database, we propose a new hierarchical indexing method. It is composed of offline and online phases. Our contribution concerns both phases. In the offline phase, after gathering the whole of the data in clusters and constructing a hierarchical index, the main originality of our contribution consists to develop a method to construct bounding forms of clusters to avoid overlapping. For the online phase, our idea improves considerably performances of similarity search. However, for this second phase, we have also developed an adapted search algorithm. Our method baptized NOHIS (Non-Overlapping Hierarchical Index Structure) use the Principal Direction Divisive Partitioning (PDDP) as algorithm of clustering. The principle of the PDDP is to divide data recursively into two sub-clusters; division is done by using the hyper-plane orthogonal to the principal direction derived from the covariance matrix and passing through the centroid of the cluster to divide. Data of each two sub-clusters obtained are including by a minimum bounding rectangle (MBR). The two MBRs are directed according to the principal direction. Consequently, the nonoverlapping between the two forms is assured. Experiments use databases containing image descriptors. Results show that the proposed method outperforms sequential scan and SRtree in processing k-nearest neighbors.
Abstract: Avoiding learning failures in mathematics e-learning environments caused by emotional problems in students with autism has become an important topic for combining of special education with information and communications technology. This study presents an adaptive emotional adjustment model in mathematics e-learning for students with autism, emphasizing the lack of emotional perception in mathematics e-learning systems. In addition, an emotion classification for students with autism was developed by inducing emotions in mathematical learning environments to record changes in the physiological signals and facial expressions of students. Using these methods, 58 emotional features were obtained. These features were then processed using one-way ANOVA and information gain (IG). After reducing the feature dimension, methods of support vector machines (SVM), k-nearest neighbors (KNN), and classification and regression trees (CART) were used to classify four emotional categories: baseline, happy, angry, and anxious. After testing and comparisons, in a situation without feature selection, the accuracy rate of the SVM classification can reach as high as 79.3-%. After using IG to reduce the feature dimension, with only 28 features remaining, SVM still has a classification accuracy of 78.2-%. The results of this research could enhance the effectiveness of eLearning in special education.
Abstract: Human activity is a major concern in a wide variety of
applications, such as video surveillance, human computer interface
and face image database management. Detecting and recognizing
faces is a crucial step in these applications. Furthermore, major
advancements and initiatives in security applications in the past years
have propelled face recognition technology into the spotlight. The
performance of existing face recognition systems declines significantly
if the resolution of the face image falls below a certain level.
This is especially critical in surveillance imagery where often, due to
many reasons, only low-resolution video of faces is available. If these
low-resolution images are passed to a face recognition system, the
performance is usually unacceptable. Hence, resolution plays a key
role in face recognition systems. In this paper we introduce a new
low resolution face recognition system based on mixture of expert
neural networks. In order to produce the low resolution input images
we down-sampled the 48 × 48 ORL images to 12 × 12 ones using
the nearest neighbor interpolation method and after that applying
the bicubic interpolation method yields enhanced images which is
given to the Principal Component Analysis feature extractor system.
Comparison with some of the most related methods indicates that
the proposed novel model yields excellent recognition rate in low
resolution face recognition that is the recognition rate of 100% for
the training set and 96.5% for the test set.
Abstract: In this paper we consider a one-dimensional random
geometric graph process with the inter-nodal gaps evolving according
to an exponential AR(1) process. The transition probability matrix
and stationary distribution are derived for the Markov chains concerning
connectivity and the number of components. We analyze the
algorithm for hitting time regarding disconnectivity. In addition to
dynamical properties, we also study topological properties for static
snapshots. We obtain the degree distributions as well as asymptotic
precise bounds and strong law of large numbers for connectivity
threshold distance and the largest nearest neighbor distance amongst
others. Both exact results and limit theorems are provided in this
paper.