A New Composition Method of Admissible Support Vector Kernel Based on Reproducing Kernel

Kernel function, which allows the formulation of nonlinear variants of any algorithm that can be cast in terms of dot products, makes the Support Vector Machines (SVM) have been successfully applied in many fields, e.g. classification and regression. The importance of kernel has motivated many studies on its composition. It-s well-known that reproducing kernel (R.K) is a useful kernel function which possesses many properties, e.g. positive definiteness, reproducing property and composing complex R.K by simple operation. There are two popular ways to compute the R.K with explicit form. One is to construct and solve a specific differential equation with boundary value whose handicap is incapable of obtaining a unified form of R.K. The other is using a piecewise integral of the Green function associated with a differential operator L. The latter benefits the computation of a R.K with a unified explicit form and theoretical analysis, whereas there are relatively later studies and fewer practical computations. In this paper, a new algorithm for computing a R.K is presented. It can obtain the unified explicit form of R.K in general reproducing kernel Hilbert space. It avoids constructing and solving the complex differential equations manually and benefits an automatic, flexible and rigorous computation for more general RKHS. In order to validate that the R.K computed by the algorithm can be used in SVM well, some illustrative examples and a comparison between R.K and Gaussian kernel (RBF) in support vector regression are presented. The result shows that the performance of R.K is close or slightly superior to that of RBF.

Least Square-SVM Detector for Wireless BPSK in Multi-Environmental Noise

Support Vector Machine (SVM) is a statistical learning tool developed to a more complex concept of structural risk minimization (SRM). In this paper, SVM is applied to signal detection in communication systems in the presence of channel noise in various environments in the form of Rayleigh fading, additive white Gaussian background noise (AWGN), and interference noise generalized as additive color Gaussian noise (ACGN). The structure and performance of SVM in terms of the bit error rate (BER) metric is derived and simulated for these advanced stochastic noise models and the computational complexity of the implementation, in terms of average computational time per bit, is also presented. The performance of SVM is then compared to conventional binary signaling optimal model-based detector driven by binary phase shift keying (BPSK) modulation. We show that the SVM performance is superior to that of conventional matched filter-, innovation filter-, and Wiener filter-driven detectors, even in the presence of random Doppler carrier deviation, especially for low SNR (signal-to-noise ratio) ranges. For large SNR, the performance of the SVM was similar to that of the classical detectors. However, the convergence between SVM and maximum likelihood detection occurred at a higher SNR as the noise environment became more hostile.

Posture Recognition using Combined Statistical and Geometrical Feature Vectors based on SVM

It is hard to percept the interaction process with machines when visual information is not available. In this paper, we have addressed this issue to provide interaction through visual techniques. Posture recognition is done for American Sign Language to recognize static alphabets and numbers. 3D information is exploited to obtain segmentation of hands and face using normal Gaussian distribution and depth information. Features for posture recognition are computed using statistical and geometrical properties which are translation, rotation and scale invariant. Hu-Moment as statistical features and; circularity and rectangularity as geometrical features are incorporated to build the feature vectors. These feature vectors are used to train SVM for classification that recognizes static alphabets and numbers. For the alphabets, curvature analysis is carried out to reduce the misclassifications. The experimental results show that proposed system recognizes posture symbols by achieving recognition rate of 98.65% and 98.6% for ASL alphabets and numbers respectively.

Comparison of Domain and Hydrophobicity Features for the Prediction of Protein-Protein Interactions using Support Vector Machines

The protein domain structure has been widely used as the most informative sequence feature to computationally predict protein-protein interactions. However, in a recent study, a research group has reported a very high accuracy of 94% using hydrophobicity feature. Therefore, in this study we compare and verify the usefulness of protein domain structure and hydrophobicity properties as the sequence features. Using the Support Vector Machines (SVM) as the learning system, our results indicate that both features achieved accuracy of nearly 80%. Furthermore, domains structure had receiver operating characteristic (ROC) score of 0.8480 with running time of 34 seconds, while hydrophobicity had ROC score of 0.8159 with running time of 20,571 seconds (5.7 hours). These results indicate that protein-protein interaction can be predicted from domain structure with reliable accuracy and acceptable running time.

An Automatic Pipeline Monitoring System Based on PCA and SVM

This paper proposes a novel system for monitoring the health of underground pipelines. Some of these pipelines transport dangerous contents and any damage incurred might have catastrophic consequences. However, most of these damage are unintentional and usually a result of surrounding construction activities. In order to prevent these potential damages, monitoring systems are indispensable. This paper focuses on acoustically recognizing road cutters since they prelude most construction activities in modern cities. Acoustic recognition can be easily achieved by installing a distributed computing sensor network along the pipelines and using smart sensors to “listen" for potential threat; if there is a real threat, raise some form of alarm. For efficient pipeline monitoring, a novel monitoring approach is proposed. Principal Component Analysis (PCA) was studied and applied. Eigenvalues were regarded as the special signature that could characterize a sound sample, and were thus used for the feature vector for sound recognition. The denoising ability of PCA could make it robust to noise interference. One class SVM was used for classifier. On-site experiment results show that the proposed PCA and SVM based acoustic recognition system will be very effective with a low tendency for raising false alarms.

Harmonic Parameters with HHT and Wavelet Transform for Automatic Sleep Stages Scoring

Previously, harmonic parameters (HPs) have been selected as features extracted from EEG signals for automatic sleep scoring. However, in previous studies, only one HP parameter was used, which were directly extracted from the whole epoch of EEG signal. In this study, two different transformations were applied to extract HPs from EEG signals: Hilbert-Huang transform (HHT) and wavelet transform (WT). EEG signals are decomposed by the two transformations; and features were extracted from different components. Twelve parameters (four sets of HPs) were extracted. Some of the parameters are highly diverse among different stages. Afterward, HPs from two transformations were used to building a rough sleep stages scoring model using the classifier SVM. The performance of this model is about 78% using the features obtained by our proposed extractions. Our results suggest that these features may be useful for automatic sleep stages scoring.

SVM Based Model as an Optimal Classifier for the Classification of Sonar Signals

Research into the problem of classification of sonar signals has been taken up as a challenging task for the neural networks. This paper investigates the design of an optimal classifier using a Multi layer Perceptron Neural Network (MLP NN) and Support Vector Machines (SVM). Results obtained using sonar data sets suggest that SVM classifier perform well in comparison with well-known MLP NN classifier. An average classification accuracy of 91.974% is achieved with SVM classifier and 90.3609% with MLP NN classifier, on the test instances. The area under the Receiver Operating Characteristics (ROC) curve for the proposed SVM classifier on test data set is found as 0.981183, which is very close to unity and this clearly confirms the excellent quality of the proposed classifier. The SVM classifier employed in this paper is implemented using kernel Adatron algorithm is seen to be robust and relatively insensitive to the parameter initialization in comparison to MLP NN.

An SVM based Classification Method for Cancer Data using Minimum Microarray Gene Expressions

This paper gives a novel method for improving classification performance for cancer classification with very few microarray Gene expression data. The method employs classification with individual gene ranking and gene subset ranking. For selection and classification, the proposed method uses the same classifier. The method is applied to three publicly available cancer gene expression datasets from Lymphoma, Liver and Leukaemia datasets. Three different classifiers namely Support vector machines-one against all (SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant analysis (LDA) were tested and the results indicate the improvement in performance of SVM-OAA classifier with satisfactory results on all the three datasets when compared with the other two classifiers.

A Bayesian Kernel for the Prediction of Protein- Protein Interactions

Understanding proteins functions is a major goal in the post-genomic era. Proteins usually work in context of other proteins and rarely function alone. Therefore, it is highly relevant to study the interaction partners of a protein in order to understand its function. Machine learning techniques have been widely applied to predict protein-protein interactions. Kernel functions play an important role for a successful machine learning technique. Choosing the appropriate kernel function can lead to a better accuracy in a binary classifier such as the support vector machines. In this paper, we describe a Bayesian kernel for the support vector machine to predict protein-protein interactions. The use of Bayesian kernel can improve the classifier performance by incorporating the probability characteristic of the available experimental protein-protein interactions data that were compiled from different sources. In addition, the probabilistic output from the Bayesian kernel can assist biologists to conduct more research on the highly predicted interactions. The results show that the accuracy of the classifier has been improved using the Bayesian kernel compared to the standard SVM kernels. These results imply that protein-protein interaction can be predicted using Bayesian kernel with better accuracy compared to the standard SVM kernels.

Practical Aspects of Face Recognition

Current systems for face recognition techniques often use either SVM or Adaboost techniques for face detection part and use PCA for face recognition part. In this paper, we offer a novel method for not only a powerful face detection system based on Six-segment-filters (SSR) and Adaboost learning algorithms but also for a face recognition system. A new exclusive face detection algorithm has been developed and connected with the recognition algorithm. As a result of it, we obtained an overall high-system performance compared with current systems. The proposed algorithm was tested on CMU, FERET, UNIBE, MIT face databases and significant performance has obtained.

Combination of Different Classifiers for Cardiac Arrhythmia Recognition

This paper describes a new supervised fusion (hybrid) electrocardiogram (ECG) classification solution consisting of a new QRS complex geometrical feature extraction as well as a new version of the learning vector quantization (LVQ) classification algorithm aimed for overcoming the stability-plasticity dilemma. Toward this objective, after detection and delineation of the major events of ECG signal via an appropriate algorithm, each QRS region and also its corresponding discrete wavelet transform (DWT) are supposed as virtual images and each of them is divided into eight polar sectors. Then, the curve length of each excerpted segment is calculated and is used as the element of the feature space. To increase the robustness of the proposed classification algorithm versus noise, artifacts and arrhythmic outliers, a fusion structure consisting of five different classifiers namely as Support Vector Machine (SVM), Modified Learning Vector Quantization (MLVQ) and three Multi Layer Perceptron-Back Propagation (MLP–BP) neural networks with different topologies were designed and implemented. The new proposed algorithm was applied to all 48 MIT–BIH Arrhythmia Database records (within–record analysis) and the discrimination power of the classifier in isolation of different beat types of each record was assessed and as the result, the average accuracy value Acc=98.51% was obtained. Also, the proposed method was applied to 6 number of arrhythmias (Normal, LBBB, RBBB, PVC, APB, PB) belonging to 20 different records of the aforementioned database (between– record analysis) and the average value of Acc=95.6% was achieved. To evaluate performance quality of the new proposed hybrid learning machine, the obtained results were compared with similar peer– reviewed studies in this area.

Virulent-GO: Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Ontology Terms

Prediction of bacterial virulent protein sequences can give assistance to identification and characterization of novel virulence-associated factors and discover drug/vaccine targets against proteins indispensable to pathogenicity. Gene Ontology (GO) annotation which describes functions of genes and gene products as a controlled vocabulary of terms has been shown effectively for a variety of tasks such as gene expression study, GO annotation prediction, protein subcellular localization, etc. In this study, we propose a sequence-based method Virulent-GO by mining informative GO terms as features for predicting bacterial virulent proteins. Each protein in the datasets used by the existing method VirulentPred is annotated by using BLAST to obtain its homologies with known accession numbers for retrieving GO terms. After investigating various popular classifiers using the same five-fold cross-validation scheme, Virulent-GO using the single kind of GO term features with an accuracy of 82.5% is slightly better than VirulentPred with 81.8% using five kinds of sequence-based features. For the evaluation of independent test, Virulent-GO also yields better results (82.0%) than VirulentPred (80.7%). When evaluating single kind of feature with SVM, the GO term feature performs much well, compared with each of the five kinds of features.

Corporate Credit Rating using Multiclass Classification Models with order Information

Corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has been one of the attractive research topics in the literature. In recent years, multiclass classification models such as artificial neural network (ANN) or multiclass support vector machine (MSVM) have become a very appealing machine learning approaches due to their good performance. However, most of them have only focused on classifying samples into nominal categories, thus the unique characteristic of the credit rating - ordinality - has been seldom considered in their approaches. This study proposes new types of ANN and MSVM classifiers, which are named OMANN and OMSVM respectively. OMANN and OMSVM are designed to extend binary ANN or SVM classifiers by applying ordinal pairwise partitioning (OPP) strategy. These models can handle ordinal multiple classes efficiently and effectively. To validate the usefulness of these two models, we applied them to the real-world bond rating case. We compared the results of our models to those of conventional approaches. The experimental results showed that our proposed models improve classification accuracy in comparison to typical multiclass classification techniques with the reduced computation resource.

Evaluating some Feature Selection Methods for an Improved SVM Classifier

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of features selection methods to reduce the dimensionality of the document-representation vector. Four feature selection methods are evaluated: Random Selection, Information Gain (IG), Support Vector Machine (called SVM_FS) and Genetic Algorithm with SVM (GA_FS). We showed that the best results were obtained with SVM_FS and GA_FS methods for a relatively small dimension of the features vector comparative with the IG method that involves longer vectors, for quite similar classification accuracies. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Face Recognition with PCA and KPCA using Elman Neural Network and SVM

In this paper, in order to categorize ORL database face pictures, principle Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) methods by using Elman neural network and Support Vector Machine (SVM) categorization methods are used. Elman network as a recurrent neural network is proposed for modeling storage systems and also it is used for reviewing the effect of using PCA numbers on system categorization precision rate and database pictures categorization time. Categorization stages are conducted with various components numbers and the obtained results of both Elman neural network categorization and support vector machine are compared. In optimum manner 97.41% recognition accuracy is obtained.

Detection of Ultrasonic Images in the Presence of a Random Number of Scatterers: A Statistical Learning Approach

Support Vector Machine (SVM) is a statistical learning tool that was initially developed by Vapnik in 1979 and later developed to a more complex concept of structural risk minimization (SRM). SVM is playing an increasing role in applications to detection problems in various engineering problems, notably in statistical signal processing, pattern recognition, image analysis, and communication systems. In this paper, SVM was applied to the detection of medical ultrasound images in the presence of partially developed speckle noise. The simulation was done for single look and multi-look speckle models to give a complete overlook and insight to the new proposed model of the SVM-based detector. The structure of the SVM was derived and applied to clinical ultrasound images and its performance in terms of the mean square error (MSE) metric was calculated. We showed that the SVM-detected ultrasound images have a very low MSE and are of good quality. The quality of the processed speckled images improved for the multi-look model. Furthermore, the contrast of the SVM detected images was higher than that of the original non-noisy images, indicating that the SVM approach increased the distance between the pixel reflectivity levels (detection hypotheses) in the original images.

Named Entity Recognition using Support Vector Machine: A Language Independent Approach

Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.

A Study of Gaps in CBMIR Using Different Methods and Prospective

In recent years, rapid advances in software and hardware in the field of information technology along with a digital imaging revolution in the medical domain facilitate the generation and storage of large collections of images by hospitals and clinics. To search these large image collections effectively and efficiently poses significant technical challenges, and it raises the necessity of constructing intelligent retrieval systems. Content-based Image Retrieval (CBIR) consists of retrieving the most visually similar images to a given query image from a database of images[5]. Medical CBIR (content-based image retrieval) applications pose unique challenges but at the same time offer many new opportunities. On one hand, while one can easily understand news or sports videos, a medical image is often completely incomprehensible to untrained eyes.

An Optimal Feature Subset Selection for Leaf Analysis

This paper describes an optimal approach for feature subset selection to classify the leaves based on Genetic Algorithm (GA) and Kernel Based Principle Component Analysis (KPCA). Due to high complexity in the selection of the optimal features, the classification has become a critical task to analyse the leaf image data. Initially the shape, texture and colour features are extracted from the leaf images. These extracted features are optimized through the separate functioning of GA and KPCA. This approach performs an intersection operation over the subsets obtained from the optimization process. Finally, the most common matching subset is forwarded to train the Support Vector Machine (SVM). Our experimental results successfully prove that the application of GA and KPCA for feature subset selection using SVM as a classifier is computationally effective and improves the accuracy of the classifier.

Extrapolation of Clinical Data from an Oral Glucose Tolerance Test Using a Support Vector Machine

To extract the important physiological factors related to diabetes from an oral glucose tolerance test (OGTT) by mathematical modeling, highly informative but convenient protocols are required. Current models require a large number of samples and extended period of testing, which is not practical for daily use. The purpose of this study is to make model assessments possible even from a reduced number of samples taken over a relatively short period. For this purpose, test values were extrapolated using a support vector machine. A good correlation was found between reference and extrapolated values in evaluated 741 OGTTs. This result indicates that a reduction in the number of clinical test is possible through a computational approach.