Abstract: A fusion classifier composed of two modules, one made by a hidden Markov model (HMM) and the other by a support vector machine (SVM), is proposed to recognize faces with pose variations in open-set recognition settings. The HMM module captures the evolution of facial features across a subject-s face using the subject-s facial images only, without referencing to the faces of others. Because of the captured evolutionary process of facial features, the HMM module retains certain robustness against pose variations, yielding low false rejection rates (FRR) for recognizing faces across poses. This is, however, on the price of poor false acceptance rates (FAR) when recognizing other faces because it is built upon withinclass samples only. The SVM module in the proposed model is developed following a special design able to substantially diminish the FAR and further lower down the FRR. The proposed fusion classifier has been evaluated in performance using the CMU PIE database, and proven effective for open-set face recognition with pose variations. Experiments have also shown that it outperforms the face classifier made by HMM or SVM alone.
Abstract: It is important to predict yield in semiconductor test process in order to increase yield. In this study, yield prediction means finding out defective die, wafer or lot effectively. Semiconductor test process consists of some test steps and each test includes various test items. In other world, test data has a big and complicated characteristic. It also is disproportionably distributed as the number of data belonging to FAIL class is extremely low. For yield prediction, general data mining techniques have a limitation without any data preprocessing due to eigen properties of test data. Therefore, this study proposes an under-sampling method using support vector machine (SVM) to eliminate an imbalanced characteristic. For evaluating a performance, randomly under-sampling method is compared with the proposed method using actual semiconductor test data. As a result, sampling method using SVM is effective in generating robust model for yield prediction.
Abstract: Support Vector Machine (SVM) is a recent class of statistical classification and regression techniques playing an increasing role in applications to detection problems in various engineering problems, notably in statistical signal processing, pattern recognition, image analysis, and communication systems. In this paper, SVM is applied to an infrared (IR) binary communication system with different types of channel models including Ricean multipath fading and partially developed scattering channel with additive white Gaussian noise (AWGN) at the receiver. The structure and performance of SVM in terms of the bit error rate (BER) metric is derived and simulated for these channel stochastic models and the computational complexity of the implementation, in terms of average computational time per bit, is also presented. The performance of SVM is then compared to classical binary signal maximum likelihood detection using a matched filter driven by On-Off keying (OOK) modulation. We found that the performance of SVM is superior to that of the traditional optimal detection schemes used in statistical communication, especially for very low signal-to-noise ratio (SNR) ranges. For large SNR, the performance of the SVM is similar to that of the classical detectors. The implication of these results is that SVM can prove very beneficial to IR communication systems that notoriously suffer from low SNR at the cost of increased computational complexity.
Abstract: The aim of this paper is to present a methodology in
three steps to forecast supply chain demand. In first step, various data
mining techniques are applied in order to prepare data for entering
into forecasting models. In second step, the modeling step, an
artificial neural network and support vector machine is presented
after defining Mean Absolute Percentage Error index for measuring
error. The structure of artificial neural network is selected based on
previous researchers' results and in this article the accuracy of
network is increased by using sensitivity analysis. The best forecast
for classical forecasting methods (Moving Average, Exponential
Smoothing, and Exponential Smoothing with Trend) is resulted based
on prepared data and this forecast is compared with result of support
vector machine and proposed artificial neural network. The results
show that artificial neural network can forecast more precisely in
comparison with other methods. Finally, forecasting methods'
stability is analyzed by using raw data and even the effectiveness of
clustering analysis is measured.
Abstract: Non-Destructive evaluation of in-service power
transformer condition is necessary for avoiding catastrophic failures.
Dissolved Gas Analysis (DGA) is one of the important methods.
Traditional, statistical and intelligent DGA approaches have been
adopted for accurate classification of incipient fault sources.
Unfortunately, there are not often enough faulty patterns required for
sufficient training of intelligent systems. By bootstrapping the
shortcoming is expected to be alleviated and algorithms with better
classification success rates to be obtained. In this paper the
performance of an artificial neural network, K-Nearest Neighbour
and support vector machine methods using bootstrapped data are
detailed and shown that while the success rate of the ANN algorithms
improves remarkably, the outcome of the others do not benefit so
much from the provided enlarged data space. For assessment, two
databases are employed: IEC TC10 and a dataset collected from
reported data in papers. High average test success rate well exhibits
the remarkable outcome.
Abstract: Support Vector Machine (SVM) is a statistical
learning tool developed to a more complex concept of
structural risk minimization (SRM). In this paper, SVM is
applied to signal detection in communication systems in the
presence of channel noise in various environments in the form
of Rayleigh fading, additive white Gaussian background noise
(AWGN), and interference noise generalized as additive color
Gaussian noise (ACGN). The structure and performance of
SVM in terms of the bit error rate (BER) metric is derived and
simulated for these advanced stochastic noise models and the
computational complexity of the implementation, in terms of
average computational time per bit, is also presented. The
performance of SVM is then compared to conventional binary
signaling optimal model-based detector driven by binary
phase shift keying (BPSK) modulation. We show that the
SVM performance is superior to that of conventional matched
filter-, innovation filter-, and Wiener filter-driven detectors,
even in the presence of random Doppler carrier deviation,
especially for low SNR (signal-to-noise ratio) ranges. For
large SNR, the performance of the SVM was similar to that of
the classical detectors. However, the convergence between
SVM and maximum likelihood detection occurred at a higher
SNR as the noise environment became more hostile.
Abstract: This paper describes a new supervised fusion (hybrid)
electrocardiogram (ECG) classification solution consisting of a new
QRS complex geometrical feature extraction as well as a new version
of the learning vector quantization (LVQ) classification algorithm
aimed for overcoming the stability-plasticity dilemma. Toward this
objective, after detection and delineation of the major events of ECG
signal via an appropriate algorithm, each QRS region and also its
corresponding discrete wavelet transform (DWT) are supposed as
virtual images and each of them is divided into eight polar sectors.
Then, the curve length of each excerpted segment is calculated
and is used as the element of the feature space. To increase the
robustness of the proposed classification algorithm versus noise,
artifacts and arrhythmic outliers, a fusion structure consisting of
five different classifiers namely as Support Vector Machine (SVM),
Modified Learning Vector Quantization (MLVQ) and three Multi
Layer Perceptron-Back Propagation (MLP–BP) neural networks with
different topologies were designed and implemented. The new proposed
algorithm was applied to all 48 MIT–BIH Arrhythmia Database
records (within–record analysis) and the discrimination power of the
classifier in isolation of different beat types of each record was
assessed and as the result, the average accuracy value Acc=98.51%
was obtained. Also, the proposed method was applied to 6 number
of arrhythmias (Normal, LBBB, RBBB, PVC, APB, PB) belonging
to 20 different records of the aforementioned database (between–
record analysis) and the average value of Acc=95.6% was achieved.
To evaluate performance quality of the new proposed hybrid learning
machine, the obtained results were compared with similar peer–
reviewed studies in this area.
Abstract: In this paper, in order to categorize ORL database face
pictures, principle Component Analysis (PCA) and Kernel Principal
Component Analysis (KPCA) methods by using Elman neural
network and Support Vector Machine (SVM) categorization methods
are used. Elman network as a recurrent neural network is proposed
for modeling storage systems and also it is used for reviewing the
effect of using PCA numbers on system categorization precision rate
and database pictures categorization time. Categorization stages are
conducted with various components numbers and the obtained results
of both Elman neural network categorization and support vector
machine are compared. In optimum manner 97.41% recognition
accuracy is obtained.
Abstract: Support Vector Machine (SVM) is a statistical learning tool that was initially developed by Vapnik in 1979 and later developed to a more complex concept of structural risk minimization (SRM). SVM is playing an increasing role in applications to detection problems in various engineering problems, notably in statistical signal processing, pattern recognition, image analysis, and communication systems. In this paper, SVM was applied to the detection of medical ultrasound images in the presence of partially developed speckle noise. The simulation was done for single look and multi-look speckle models to give a complete overlook and insight to the new proposed model of the SVM-based detector. The structure of the SVM was derived and applied to clinical ultrasound images and its performance in terms of the mean square error (MSE) metric was calculated. We showed that the SVM-detected ultrasound images have a very low MSE and are of good quality. The quality of the processed speckled images improved for the multi-look model. Furthermore, the contrast of the SVM detected images was higher than that of the original non-noisy images, indicating that the SVM approach increased the distance between the pixel reflectivity levels (detection hypotheses) in the original images.
Abstract: Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.
Abstract: In recent years, rapid advances in software and hardware in the field of information technology along with a digital imaging revolution in the medical domain facilitate the generation and storage of large collections of images by hospitals and clinics. To search these large image collections effectively and efficiently poses significant technical challenges, and it raises the necessity of constructing intelligent retrieval systems. Content-based Image Retrieval (CBIR) consists of retrieving the most visually similar images to a given query image from a database of images[5]. Medical CBIR (content-based image retrieval) applications pose unique challenges but at the same time offer many new opportunities. On one hand, while one can easily understand news or sports videos, a medical image is often completely incomprehensible to untrained eyes.
Abstract: This paper describes an optimal approach for feature
subset selection to classify the leaves based on Genetic Algorithm
(GA) and Kernel Based Principle Component Analysis (KPCA). Due
to high complexity in the selection of the optimal features, the
classification has become a critical task to analyse the leaf image
data. Initially the shape, texture and colour features are extracted
from the leaf images. These extracted features are optimized through
the separate functioning of GA and KPCA. This approach performs
an intersection operation over the subsets obtained from the
optimization process. Finally, the most common matching subset is
forwarded to train the Support Vector Machine (SVM). Our
experimental results successfully prove that the application of GA
and KPCA for feature subset selection using SVM as a classifier is
computationally effective and improves the accuracy of the classifier.
Abstract: Cancer classification to their corresponding cohorts has been key area of research in bioinformatics aiming better prognosis of the disease. High dimensionality of gene data has been makes it a complex task and requires significance data identification technique in order to reducing the dimensionality and identification of significant information. In this paper, we have proposed a novel approach for classification of oral cancer into metastasis positive and negative patients. We have used significance analysis of microarrays (SAM) for identifying significant genes which constitutes gene signature. 3 different gene signatures were identified using SAM from 3 different combination of training datasets and their classification accuracy was calculated on corresponding testing datasets using k-Nearest Neighbour (kNN), Fuzzy C-Means Clustering (FCM), Support Vector Machine (SVM) and Backpropagation Neural Network (BPNN). A final gene signature of only 9 genes was obtained from above 3 individual gene signatures. 9 gene signature-s classification capability was compared using same classifiers on same testing datasets. Results obtained from experimentation shows that 9 gene signature classified all samples in testing dataset accurately while individual genes could not classify all accurately.
Abstract: Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.