Abstract: Kernel function, which allows the formulation of nonlinear variants of any algorithm that can be cast in terms of dot products, makes the Support Vector Machines (SVM) have been successfully applied in many fields, e.g. classification and regression. The importance of kernel has motivated many studies on its composition. It-s well-known that reproducing kernel (R.K) is a useful kernel function which possesses many properties, e.g. positive definiteness, reproducing property and composing complex R.K by simple operation. There are two popular ways to compute the R.K with explicit form. One is to construct and solve a specific differential equation with boundary value whose handicap is incapable of obtaining a unified form of R.K. The other is using a piecewise integral of the Green function associated with a differential operator L. The latter benefits the computation of a R.K with a unified explicit form and theoretical analysis, whereas there are relatively later studies and fewer practical computations. In this paper, a new algorithm for computing a R.K is presented. It can obtain the unified explicit form of R.K in general reproducing kernel Hilbert space. It avoids constructing and solving the complex differential equations manually and benefits an automatic, flexible and rigorous computation for more general RKHS. In order to validate that the R.K computed by the algorithm can be used in SVM well, some illustrative examples and a comparison between R.K and Gaussian kernel (RBF) in support vector regression are presented. The result shows that the performance of R.K is close or slightly superior to that of RBF.
Abstract: Support Vector Machine (SVM) is a statistical
learning tool developed to a more complex concept of
structural risk minimization (SRM). In this paper, SVM is
applied to signal detection in communication systems in the
presence of channel noise in various environments in the form
of Rayleigh fading, additive white Gaussian background noise
(AWGN), and interference noise generalized as additive color
Gaussian noise (ACGN). The structure and performance of
SVM in terms of the bit error rate (BER) metric is derived and
simulated for these advanced stochastic noise models and the
computational complexity of the implementation, in terms of
average computational time per bit, is also presented. The
performance of SVM is then compared to conventional binary
signaling optimal model-based detector driven by binary
phase shift keying (BPSK) modulation. We show that the
SVM performance is superior to that of conventional matched
filter-, innovation filter-, and Wiener filter-driven detectors,
even in the presence of random Doppler carrier deviation,
especially for low SNR (signal-to-noise ratio) ranges. For
large SNR, the performance of the SVM was similar to that of
the classical detectors. However, the convergence between
SVM and maximum likelihood detection occurred at a higher
SNR as the noise environment became more hostile.
Abstract: It is hard to percept the interaction process with machines when visual information is not available. In this paper, we have addressed this issue to provide interaction through visual techniques. Posture recognition is done for American Sign Language to recognize static alphabets and numbers. 3D information is exploited to obtain segmentation of hands and face using normal Gaussian distribution and depth information. Features for posture recognition are computed using statistical and geometrical properties which are translation, rotation and scale invariant. Hu-Moment as statistical features and; circularity and rectangularity as geometrical features are incorporated to build the feature vectors. These feature vectors are used to train SVM for classification that recognizes static alphabets and numbers. For the alphabets, curvature analysis is carried out to reduce the misclassifications. The experimental results show that proposed system recognizes posture symbols by achieving recognition rate of 98.65% and 98.6% for ASL alphabets and numbers respectively.
Abstract: The protein domain structure has been widely used as the most informative sequence feature to computationally predict protein-protein interactions. However, in a recent study, a research group has reported a very high accuracy of 94% using hydrophobicity feature. Therefore, in this study we compare and verify the usefulness of protein domain structure and hydrophobicity properties as the sequence features. Using the Support Vector Machines (SVM) as the learning system, our results indicate that both features achieved accuracy of nearly 80%. Furthermore, domains structure had receiver operating characteristic (ROC) score of 0.8480 with running time of 34 seconds, while hydrophobicity had ROC score of 0.8159 with running time of 20,571 seconds (5.7 hours). These results indicate that protein-protein interaction can be predicted from domain structure with reliable accuracy and acceptable running time.
Abstract: This paper proposes a novel system for monitoring the
health of underground pipelines. Some of these pipelines transport
dangerous contents and any damage incurred might have catastrophic
consequences. However, most of these damage are unintentional and
usually a result of surrounding construction activities. In order to
prevent these potential damages, monitoring systems are
indispensable. This paper focuses on acoustically recognizing road
cutters since they prelude most construction activities in modern
cities. Acoustic recognition can be easily achieved by installing a
distributed computing sensor network along the pipelines and using
smart sensors to “listen" for potential threat; if there is a real threat,
raise some form of alarm. For efficient pipeline monitoring, a novel
monitoring approach is proposed. Principal Component Analysis
(PCA) was studied and applied. Eigenvalues were regarded as the
special signature that could characterize a sound sample, and were
thus used for the feature vector for sound recognition. The denoising
ability of PCA could make it robust to noise interference. One class
SVM was used for classifier. On-site experiment results show that the
proposed PCA and SVM based acoustic recognition system will be
very effective with a low tendency for raising false alarms.
Abstract: Previously, harmonic parameters (HPs) have been
selected as features extracted from EEG signals for automatic sleep
scoring. However, in previous studies, only one HP parameter was
used, which were directly extracted from the whole epoch of EEG
signal.
In this study, two different transformations were applied to extract
HPs from EEG signals: Hilbert-Huang transform (HHT) and wavelet
transform (WT). EEG signals are decomposed by the two
transformations; and features were extracted from different
components. Twelve parameters (four sets of HPs) were extracted.
Some of the parameters are highly diverse among different stages.
Afterward, HPs from two transformations were used to building a
rough sleep stages scoring model using the classifier SVM. The
performance of this model is about 78% using the features obtained by
our proposed extractions. Our results suggest that these features may
be useful for automatic sleep stages scoring.
Abstract: Research into the problem of classification of sonar signals has been taken up as a challenging task for the neural networks. This paper investigates the design of an optimal classifier using a Multi layer Perceptron Neural Network (MLP NN) and Support Vector Machines (SVM). Results obtained using sonar data sets suggest that SVM classifier perform well in comparison with well-known MLP NN classifier. An average classification accuracy of 91.974% is achieved with SVM classifier and 90.3609% with MLP NN classifier, on the test instances. The area under the Receiver Operating Characteristics (ROC) curve for the proposed SVM classifier on test data set is found as 0.981183, which is very close to unity and this clearly confirms the excellent quality of the proposed classifier. The SVM classifier employed in this paper is implemented using kernel Adatron algorithm is seen to be robust and relatively insensitive to the parameter initialization in comparison to MLP NN.
Abstract: This paper gives a novel method for improving
classification performance for cancer classification with very few
microarray Gene expression data. The method employs classification
with individual gene ranking and gene subset ranking. For selection
and classification, the proposed method uses the same classifier. The
method is applied to three publicly available cancer gene expression
datasets from Lymphoma, Liver and Leukaemia datasets. Three
different classifiers namely Support vector machines-one against all
(SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant
analysis (LDA) were tested and the results indicate the improvement
in performance of SVM-OAA classifier with satisfactory results on
all the three datasets when compared with the other two classifiers.
Abstract: Understanding proteins functions is a major goal in
the post-genomic era. Proteins usually work in context of other
proteins and rarely function alone. Therefore, it is highly relevant to
study the interaction partners of a protein in order to understand its
function. Machine learning techniques have been widely applied to
predict protein-protein interactions. Kernel functions play an
important role for a successful machine learning technique. Choosing
the appropriate kernel function can lead to a better accuracy in a
binary classifier such as the support vector machines. In this paper,
we describe a Bayesian kernel for the support vector machine to
predict protein-protein interactions. The use of Bayesian kernel can
improve the classifier performance by incorporating the probability
characteristic of the available experimental protein-protein
interactions data that were compiled from different sources. In
addition, the probabilistic output from the Bayesian kernel can assist
biologists to conduct more research on the highly predicted
interactions. The results show that the accuracy of the classifier has
been improved using the Bayesian kernel compared to the standard
SVM kernels. These results imply that protein-protein interaction can
be predicted using Bayesian kernel with better accuracy compared to
the standard SVM kernels.
Abstract: Current systems for face recognition techniques often
use either SVM or Adaboost techniques for face detection part and use
PCA for face recognition part. In this paper, we offer a novel method
for not only a powerful face detection system based on
Six-segment-filters (SSR) and Adaboost learning algorithms but also
for a face recognition system. A new exclusive face detection
algorithm has been developed and connected with the recognition
algorithm. As a result of it, we obtained an overall high-system
performance compared with current systems. The proposed algorithm
was tested on CMU, FERET, UNIBE, MIT face databases and
significant performance has obtained.
Abstract: This paper describes a new supervised fusion (hybrid)
electrocardiogram (ECG) classification solution consisting of a new
QRS complex geometrical feature extraction as well as a new version
of the learning vector quantization (LVQ) classification algorithm
aimed for overcoming the stability-plasticity dilemma. Toward this
objective, after detection and delineation of the major events of ECG
signal via an appropriate algorithm, each QRS region and also its
corresponding discrete wavelet transform (DWT) are supposed as
virtual images and each of them is divided into eight polar sectors.
Then, the curve length of each excerpted segment is calculated
and is used as the element of the feature space. To increase the
robustness of the proposed classification algorithm versus noise,
artifacts and arrhythmic outliers, a fusion structure consisting of
five different classifiers namely as Support Vector Machine (SVM),
Modified Learning Vector Quantization (MLVQ) and three Multi
Layer Perceptron-Back Propagation (MLP–BP) neural networks with
different topologies were designed and implemented. The new proposed
algorithm was applied to all 48 MIT–BIH Arrhythmia Database
records (within–record analysis) and the discrimination power of the
classifier in isolation of different beat types of each record was
assessed and as the result, the average accuracy value Acc=98.51%
was obtained. Also, the proposed method was applied to 6 number
of arrhythmias (Normal, LBBB, RBBB, PVC, APB, PB) belonging
to 20 different records of the aforementioned database (between–
record analysis) and the average value of Acc=95.6% was achieved.
To evaluate performance quality of the new proposed hybrid learning
machine, the obtained results were compared with similar peer–
reviewed studies in this area.
Abstract: Prediction of bacterial virulent protein sequences can
give assistance to identification and characterization of novel
virulence-associated factors and discover drug/vaccine targets against
proteins indispensable to pathogenicity. Gene Ontology (GO)
annotation which describes functions of genes and gene products as a
controlled vocabulary of terms has been shown effectively for a
variety of tasks such as gene expression study, GO annotation
prediction, protein subcellular localization, etc. In this study, we
propose a sequence-based method Virulent-GO by mining informative
GO terms as features for predicting bacterial virulent proteins.
Each protein in the datasets used by the existing method
VirulentPred is annotated by using BLAST to obtain its homologies
with known accession numbers for retrieving GO terms. After
investigating various popular classifiers using the same five-fold
cross-validation scheme, Virulent-GO using the single kind of GO
term features with an accuracy of 82.5% is slightly better than
VirulentPred with 81.8% using five kinds of sequence-based features.
For the evaluation of independent test, Virulent-GO also yields better
results (82.0%) than VirulentPred (80.7%). When evaluating single
kind of feature with SVM, the GO term feature performs much well,
compared with each of the five kinds of features.
Abstract: Corporate credit rating prediction using statistical and
artificial intelligence (AI) techniques has been one of the attractive
research topics in the literature. In recent years, multiclass
classification models such as artificial neural network (ANN) or
multiclass support vector machine (MSVM) have become a very
appealing machine learning approaches due to their good
performance. However, most of them have only focused on classifying
samples into nominal categories, thus the unique characteristic of the
credit rating - ordinality - has been seldom considered in their
approaches. This study proposes new types of ANN and MSVM
classifiers, which are named OMANN and OMSVM respectively.
OMANN and OMSVM are designed to extend binary ANN or SVM
classifiers by applying ordinal pairwise partitioning (OPP) strategy.
These models can handle ordinal multiple classes efficiently and
effectively. To validate the usefulness of these two models, we applied
them to the real-world bond rating case. We compared the results of
our models to those of conventional approaches. The experimental
results showed that our proposed models improve classification
accuracy in comparison to typical multiclass classification techniques
with the reduced computation resource.
Abstract: Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of features selection methods to reduce the dimensionality of the document-representation vector. Four feature selection methods are evaluated: Random Selection, Information Gain (IG), Support Vector Machine (called SVM_FS) and Genetic Algorithm with SVM (GA_FS). We showed that the best results were obtained with SVM_FS and GA_FS methods for a relatively small dimension of the features vector comparative with the IG method that involves longer vectors, for quite similar classification accuracies. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).
Abstract: In this paper, in order to categorize ORL database face
pictures, principle Component Analysis (PCA) and Kernel Principal
Component Analysis (KPCA) methods by using Elman neural
network and Support Vector Machine (SVM) categorization methods
are used. Elman network as a recurrent neural network is proposed
for modeling storage systems and also it is used for reviewing the
effect of using PCA numbers on system categorization precision rate
and database pictures categorization time. Categorization stages are
conducted with various components numbers and the obtained results
of both Elman neural network categorization and support vector
machine are compared. In optimum manner 97.41% recognition
accuracy is obtained.
Abstract: Support Vector Machine (SVM) is a statistical learning tool that was initially developed by Vapnik in 1979 and later developed to a more complex concept of structural risk minimization (SRM). SVM is playing an increasing role in applications to detection problems in various engineering problems, notably in statistical signal processing, pattern recognition, image analysis, and communication systems. In this paper, SVM was applied to the detection of medical ultrasound images in the presence of partially developed speckle noise. The simulation was done for single look and multi-look speckle models to give a complete overlook and insight to the new proposed model of the SVM-based detector. The structure of the SVM was derived and applied to clinical ultrasound images and its performance in terms of the mean square error (MSE) metric was calculated. We showed that the SVM-detected ultrasound images have a very low MSE and are of good quality. The quality of the processed speckled images improved for the multi-look model. Furthermore, the contrast of the SVM detected images was higher than that of the original non-noisy images, indicating that the SVM approach increased the distance between the pixel reflectivity levels (detection hypotheses) in the original images.
Abstract: Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.
Abstract: In recent years, rapid advances in software and hardware in the field of information technology along with a digital imaging revolution in the medical domain facilitate the generation and storage of large collections of images by hospitals and clinics. To search these large image collections effectively and efficiently poses significant technical challenges, and it raises the necessity of constructing intelligent retrieval systems. Content-based Image Retrieval (CBIR) consists of retrieving the most visually similar images to a given query image from a database of images[5]. Medical CBIR (content-based image retrieval) applications pose unique challenges but at the same time offer many new opportunities. On one hand, while one can easily understand news or sports videos, a medical image is often completely incomprehensible to untrained eyes.
Abstract: This paper describes an optimal approach for feature
subset selection to classify the leaves based on Genetic Algorithm
(GA) and Kernel Based Principle Component Analysis (KPCA). Due
to high complexity in the selection of the optimal features, the
classification has become a critical task to analyse the leaf image
data. Initially the shape, texture and colour features are extracted
from the leaf images. These extracted features are optimized through
the separate functioning of GA and KPCA. This approach performs
an intersection operation over the subsets obtained from the
optimization process. Finally, the most common matching subset is
forwarded to train the Support Vector Machine (SVM). Our
experimental results successfully prove that the application of GA
and KPCA for feature subset selection using SVM as a classifier is
computationally effective and improves the accuracy of the classifier.
Abstract: To extract the important physiological factors related to
diabetes from an oral glucose tolerance test (OGTT) by mathematical
modeling, highly informative but convenient protocols are required.
Current models require a large number of samples and extended
period of testing, which is not practical for daily use. The purpose
of this study is to make model assessments possible even from a
reduced number of samples taken over a relatively short period.
For this purpose, test values were extrapolated using a support
vector machine. A good correlation was found between reference and
extrapolated values in evaluated 741 OGTTs. This result indicates
that a reduction in the number of clinical test is possible through a
computational approach.