Abstract: Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.
Abstract: BCI (Brain Computer Interface) is a communication machine that translates brain massages to computer commands. These machines with the help of computer programs can recognize the tasks that are imagined. Feature extraction is an important stage of the process in EEG classification that can effect in accuracy and the computation time of processing the signals. In this study we process the signal in three steps of active segment selection, fractal feature extraction, and classification. One of the great challenges in BCI applications is to improve classification accuracy and computation time together. In this paper, we have used student’s 2D sample t-statistics on continuous wavelet transforms for active segment selection to reduce the computation time. In the next level, the features are extracted from some famous fractal dimension estimation of the signal. These fractal features are Katz and Higuchi. In the classification stage we used ANFIS (Adaptive Neuro-Fuzzy Inference System) classifier, FKNN (Fuzzy K-Nearest Neighbors), LDA (Linear Discriminate Analysis), and SVM (Support Vector Machines). We resulted that active segment selection method would reduce the computation time and Fractal dimension features with ANFIS analysis on selected active segments is the best among investigated methods in EEG classification.
Abstract: Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.
Abstract: The exponential increase in the volume of medical image database has imposed new challenges to clinical routine in maintaining patient history, diagnosis, treatment and monitoring. With the advent of data mining and machine learning techniques it is possible to automate and/or assist physicians in clinical diagnosis. In this research a medical image classification framework using data mining techniques is proposed. It involves feature extraction, feature selection, feature discretization and classification. In the classification phase, the performance of the traditional kNN k nearest neighbor classifier is improved using a feature weighting scheme and a distance weighted voting instead of simple majority voting. Feature weights are calculated using the interestingness measures used in association rule mining. Experiments on the retinal fundus images show that the proposed framework improves the classification accuracy of traditional kNN from 78.57 % to 92.85 %.
Abstract: In this paper, an extreme learning machine with an automatic segmentation algorithm is applied to heart disorder classification by heart sound signals. From continuous heart sound signals, the starting points of the first (S1) and the second heart pulses (S2) are extracted and corrected by utilizing an inter-pulse histogram. From the corrected pulse positions, a single period of heart sound signals is extracted and converted to a feature vector including the mel-scaled filter bank energy coefficients and the envelope coefficients of uniform-sized sub-segments. An extreme learning machine is used to classify the feature vector. In our cardiac disorder classification and detection experiments with 9 cardiac disorder categories, the proposed method shows significantly better performance than multi-layer perceptron, support vector machine, and hidden Markov model; it achieves the classification accuracy of 81.6% and the detection accuracy of 96.9%.
Abstract: Artificial neural networks (ANN) have the ability to model input-output relationships from processing raw data. This characteristic makes them invaluable in industry domains where such knowledge is scarce at best. In the recent decades, in order to overcome the black-box characteristic of ANNs, researchers have attempted to extract the knowledge embedded within ANNs in the form of rules that can be used in inference systems. This paper presents a new technique that is able to extract a small set of rules from a two-layer ANN. The extracted rules yield high classification accuracy when implemented within a fuzzy inference system. The technique targets industry domains that possess less complex problems for which no expert knowledge exists and for which a simpler solution is preferred to a complex one. The proposed technique is more efficient, simple, and applicable than most of the previously proposed techniques.
Abstract: An early and accurate detection of Alzheimer's disease (AD) is an important stage in the treatment of individuals suffering from AD. We present an approach based on the use of structural magnetic resonance imaging (sMRI) phase images to distinguish between normal controls (NC), mild cognitive impairment (MCI) and AD patients with clinical dementia rating (CDR) of 1. Independent component analysis (ICA) technique is used for extracting useful features which form the inputs to the support vector machines (SVM), K nearest neighbour (kNN) and multilayer artificial neural network (ANN) classifiers to discriminate between the three classes. The obtained results are encouraging in terms of classification accuracy and effectively ascertain the usefulness of phase images for the classification of different stages of Alzheimer-s disease.
Abstract: Patients with diabetes are susceptible to chronic foot
wounds which may be difficult to manage and slow to heal.
Diagnosis and treatment currently rely on the subjective judgement of
experienced professionals. An objective method of tissue assessment
is required. In this paper, a data fusion approach was taken to wound
tissue classification. The supervised Maximum Likelihood and
unsupervised Multi-Modal Expectation Maximisation algorithms
were used to classify tissues within simulated wound models by
weighting the contributions of both colour and 3D depth information.
It was found that, at low weightings, depth information could show
significant improvements in classification accuracy when compared
to classification by colour alone, particularly when using the
maximum likelihood method. However, larger weightings were
found to have an entirely negative effect on accuracy.
Abstract: Classification is an important topic in machine learning
and bioinformatics. Many datasets have been introduced for
classification tasks. A dataset contains multiple features, and the quality of features influences the classification accuracy of the dataset.
The power of classification for each feature differs. In this study, we
suggest the Classification Influence Index (CII) as an indicator of classification power for each feature. CII enables evaluation of the
features in a dataset and improved classification accuracy by transformation of the dataset. By conducting experiments using CII
and the k-nearest neighbor classifier to analyze real datasets, we confirmed that the proposed index provided meaningful improvement
of the classification accuracy.
Abstract: This paper proposes a novel hybrid algorithm for feature selection based on a binary ant colony and SVM. The final subset selection is attained through the elimination of the features that produce noise or, are strictly correlated with other already selected features. Our algorithm can improve classification accuracy with a small and appropriate feature subset. Proposed algorithm is easily implemented and because of use of a simple filter in that, its computational complexity is very low. The performance of the proposed algorithm is evaluated through a real Rotary Cement kiln dataset. The results show that our algorithm outperforms existing algorithms.
Abstract: This work deals with aspects of support vector learning for large-scale data mining tasks. Based on a decomposition algorithm that can be run in serial and parallel mode we introduce a data transformation that allows for the usage of an expensive generalized kernel without additional costs. In order to speed up the decomposition algorithm we analyze the problem of working set selection for large data sets and analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our modifications and settings lead to improvement of support vector learning performance and thus allow using extensive parameter search methods to optimize classification accuracy.
Abstract: Globalization and therefore increasing tight competition among companies, have resulted to increase the importance of making well-timed decision. Devising and employing effective strategies, that are flexible and adaptive to changing market, stand a greater chance of being effective in the long-term. In other side, a clear focus on managing the entire product lifecycle has emerged as critical areas for investment. Therefore, applying wellorganized tools to employ past experience in new case, helps to make proper and managerial decisions. Case based reasoning (CBR) is based on a means of solving a new problem by using or adapting solutions to old problems. In this paper, an adapted CBR model with k-nearest neighbor (K-NN) is employed to provide suggestions for better decision making which are adopted for a given product in the middle of life phase. The set of solutions are weighted by CBR in the principle of group decision making. Wrapper approach of genetic algorithm is employed to generate optimal feature subsets. The dataset of the department store, including various products which are collected among two years, have been used. K-fold approach is used to evaluate the classification accuracy rate. Empirical results are compared with classical case based reasoning algorithm which has no special process for feature selection, CBR-PCA algorithm based on filter approach feature selection, and Artificial Neural Network. The results indicate that the predictive performance of the model, compare with two CBR algorithms, in specific case is more effective.
Abstract: The myoelectric signal (MES) is one of the Biosignals
utilized in helping humans to control equipments. Recent approaches
in MES classification to control prosthetic devices employing pattern
recognition techniques revealed two problems, first, the classification
performance of the system starts degrading when the number of
motion classes to be classified increases, second, in order to solve the
first problem, additional complicated methods were utilized which
increase the computational cost of a multifunction myoelectric
control system. In an effort to solve these problems and to achieve a
feasible design for real time implementation with high overall
accuracy, this paper presents a new method for feature extraction in
MES recognition systems. The method works by extracting features
using Wavelet Packet Transform (WPT) applied on the MES from
multiple channels, and then employs Fuzzy c-means (FCM)
algorithm to generate a measure that judges on features suitability for
classification. Finally, Principle Component Analysis (PCA) is
utilized to reduce the size of the data before computing the
classification accuracy with a multilayer perceptron neural network.
The proposed system produces powerful classification results (99%
accuracy) by using only a small portion of the original feature set.
Abstract: Tumor classification is a key area of research in the
field of bioinformatics. Microarray technology is commonly used in
the study of disease diagnosis using gene expression levels. The
main drawback of gene expression data is that it contains thousands
of genes and a very few samples. Feature selection methods are used
to select the informative genes from the microarray. These methods
considerably improve the classification accuracy. In the proposed
method, Genetic Algorithm (GA) is used for effective feature
selection. Informative genes are identified based on the T-Statistics,
Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate
solutions of GA are obtained from top-m informative genes. The
classification accuracy of k-Nearest Neighbor (kNN) method is used
as the fitness function for GA. In this work, kNN and Support Vector
Machine (SVM) are used as the classifiers. The experimental results
show that the proposed work is suitable for effective feature
selection. With the help of the selected genes, GA-kNN method
achieves 100% accuracy in 4 datasets and GA-SVM method
achieves in 5 out of 10 datasets. The GA with kNN and SVM
methods are demonstrated to be an accurate method for microarray
based tumor classification.
Abstract: A spatial classification technique incorporating a State of Art Feature Extraction algorithm is proposed in this paper for classifying a heterogeneous classes present in hyper spectral images. The classification accuracy can be improved if and only if both the feature extraction and classifier selection are proper. As the classes in the hyper spectral images are assumed to have different textures, textural classification is entertained. Run Length feature extraction is entailed along with the Principal Components and Independent Components. A Hyperspectral Image of Indiana Site taken by AVIRIS is inducted for the experiment. Among the original 220 bands, a subset of 120 bands is selected. Gray Level Run Length Matrix (GLRLM) is calculated for the selected forty bands. From GLRLMs the Run Length features for individual pixels are calculated. The Principle Components are calculated for other forty bands. Independent Components are calculated for next forty bands. As Principal & Independent Components have the ability to represent the textural content of pixels, they are treated as features. The summation of Run Length features, Principal Components, and Independent Components forms the Combined Features which are used for classification. SVM with Binary Hierarchical Tree is used to classify the hyper spectral image. Results are validated with ground truth and accuracies are calculated.
Abstract: This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.
Abstract: Since dealing with high dimensional data is
computationally complex and sometimes even intractable, recently
several feature reductions methods have been developed to reduce
the dimensionality of the data in order to simplify the calculation
analysis in various applications such as text categorization, signal
processing, image retrieval, gene expressions and etc. Among feature
reduction techniques, feature selection is one the most popular
methods due to the preservation of the original features.
In this paper, we propose a new unsupervised feature selection
method which will remove redundant features from the original
feature space by the use of probability density functions of various
features. To show the effectiveness of the proposed method, popular
feature selection methods have been implemented and compared.
Experimental results on the several datasets derived from UCI
repository database, illustrate the effectiveness of our proposed
methods in comparison with the other compared methods in terms of
both classification accuracy and the number of selected features.
Abstract: Alzheimer is known as the loss of mental functions
such as thinking, memory, and reasoning that is severe enough to
interfere with a person's daily functioning. The appearance of
Alzheimer Disease symptoms (AD) are resulted based on which part
of the brain has a variety of infection or damage. In this case, the
MRI is the best biomedical instrumentation can be ever used to
discover the AD existence. Therefore, this paper proposed a fusion
method to distinguish between the normal and (AD) MRIs. In this
combined method around 27 MRIs collected from Jordanian
Hospitals are analyzed based on the use of Low pass -morphological
filters to get the extracted statistical outputs through intensity
histogram to be employed by the descriptive box plot. Also, the
artificial neural network (ANN) is applied to test the performance of
this approach. Finally, the obtained result of t-test with confidence
accuracy (95%) has compared with classification accuracy of ANN
(100 %). The robust of the developed method can be considered
effectively to diagnose and determine the type of AD image.
Abstract: This paper proposes to use ETM+ multispectral data
and panchromatic band as well as texture features derived from the
panchromatic band for land cover classification. Four texture features
including one 'internal texture' and three GLCM based textures
namely correlation, entropy, and inverse different moment were used
in combination with ETM+ multispectral data. Two data sets
involving combination of multispectral, panchromatic band and its
texture were used and results were compared with those obtained by
using multispectral data alone. A decision tree classifier with and
without boosting were used to classify different datasets. Results
from this study suggest that the dataset consisting of panchromatic
band, four of its texture features and multispectral data was able to
increase the classification accuracy by about 2%. In comparison, a
boosted decision tree was able to increase the classification accuracy
by about 3% with the same dataset.
Abstract: Serial Analysis of Gene Expression is a powerful
quantification technique for generating cell or tissue gene expression
data. The profile of the gene expression of cell or tissue in several
different states is difficult for biologists to analyze because of the large
number of genes typically involved. However, feature selection in
machine learning can successfully reduce this problem. The method
allows reducing the features (genes) in specific SAGE data, and
determines only relevant genes. In this study, we used a genetic
algorithm to implement feature selection, and evaluate the
classification accuracy of the selected features with the K-nearest
neighbor method. In order to validate the proposed method, we used
two SAGE data sets for testing. The results of this study conclusively
prove that the number of features of the original SAGE data set can be
significantly reduced and higher classification accuracy can be
achieved.