Abstract: Face is a non-intrusive strong biometrics for
identification of original and dummy facial by different artificial
means. Face recognition is extremely important in the contexts of
computer vision, psychology, surveillance, pattern recognition,
neural network, content based video processing. The availability of a
widespread face database is crucial to test the performance of these
face recognition algorithms. The openly available face databases
include face images with a wide range of poses, illumination, gestures
and face occlusions but there is no dummy face database accessible in
public domain. This paper presents a face detection algorithm based on
the image segmentation in terms of distance from a fixed point and
template matching methods. This proposed work is having the most
appropriate number of nodal points resulting in most appropriate
outcomes in terms of face recognition and detection. The time taken to
identify and extract distinctive facial features is improved in the range
of 90 to 110 sec. with the increment of efficiency by 3%.
Abstract: Sentiment analysis means to classify a given review
document into positive or negative polar document. Sentiment
analysis research has been increased tremendously in recent times
due to its large number of applications in the industry and academia.
Sentiment analysis models can be used to determine the opinion of
the user towards any entity or product. E-commerce companies can
use sentiment analysis model to improve their products on the basis
of users’ opinion. In this paper, we propose a new One-class Support
Vector Machine (One-class SVM) based sentiment analysis model
for movie review documents. In the proposed approach, we initially
extract features from one class of documents, and further test the
given documents with the one-class SVM model if a given new test
document lies in the model or it is an outlier. Experimental results
show the effectiveness of the proposed sentiment analysis model.
Abstract: Development of a method to estimate gene functions is
an important task in bioinformatics. One of the approaches for the
annotation is the identification of the metabolic pathway that genes are
involved in. Since gene expression data reflect various intracellular
phenomena, those data are considered to be related with genes’
functions. However, it has been difficult to estimate the gene function
with high accuracy. It is considered that the low accuracy of the
estimation is caused by the difficulty of accurately measuring a gene
expression. Even though they are measured under the same condition,
the gene expressions will vary usually. In this study, we proposed a
feature extraction method focusing on the variability of gene
expressions to estimate the genes' metabolic pathway accurately. First,
we estimated the distribution of each gene expression from replicate
data. Next, we calculated the similarity between all gene pairs by KL
divergence, which is a method for calculating the similarity between
distributions. Finally, we utilized the similarity vectors as feature
vectors and trained the multiclass SVM for identifying the genes'
metabolic pathway. To evaluate our developed method, we applied the
method to budding yeast and trained the multiclass SVM for
identifying the seven metabolic pathways. As a result, the accuracy
that calculated by our developed method was higher than the one that
calculated from the raw gene expression data. Thus, our developed
method combined with KL divergence is useful for identifying the
genes' metabolic pathway.
Abstract: This paper presents a wavelet transform and Support
Vector Machine (SVM) based algorithm for estimating fault location
on transmission lines. The Discrete wavelet transform (DWT) is used
for data pre-processing and this data are used for training and testing
SVM. Five types of mother wavelet are used for signal processing to
identify a suitable wavelet family that is more appropriate for use in
estimating fault location. The results demonstrated the ability of SVM
to generalize the situation from the provided patterns and to
accurately estimate the location of faults with varying fault resistance.
Abstract: This paper explores the scalability issues associated
with solving the Named Entity Recognition (NER) problem using
Support Vector Machines (SVM) and high-dimensional features. The
performance results of a set of experiments conducted using binary
and multi-class SVM with increasing training data sizes are
examined. The NER domain chosen for these experiments is the
biomedical publications domain, especially selected due to its
importance and inherent challenges. A simple machine learning
approach is used that eliminates prior language knowledge such as
part-of-speech or noun phrase tagging thereby allowing for its
applicability across languages. No domain-specific knowledge is
included. The accuracy measures achieved are comparable to those
obtained using more complex approaches, which constitutes a
motivation to investigate ways to improve the scalability of multiclass
SVM in order to make the solution more practical and useable.
Improving training time of multi-class SVM would make support
vector machines a more viable and practical machine learning
solution for real-world problems with large datasets. An initial
prototype results in great improvement of the training time at the
expense of memory requirements.
Abstract: Tumor classification is a key area of research in the
field of bioinformatics. Microarray technology is commonly used in
the study of disease diagnosis using gene expression levels. The
main drawback of gene expression data is that it contains thousands
of genes and a very few samples. Feature selection methods are used
to select the informative genes from the microarray. These methods
considerably improve the classification accuracy. In the proposed
method, Genetic Algorithm (GA) is used for effective feature
selection. Informative genes are identified based on the T-Statistics,
Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate
solutions of GA are obtained from top-m informative genes. The
classification accuracy of k-Nearest Neighbor (kNN) method is used
as the fitness function for GA. In this work, kNN and Support Vector
Machine (SVM) are used as the classifiers. The experimental results
show that the proposed work is suitable for effective feature
selection. With the help of the selected genes, GA-kNN method
achieves 100% accuracy in 4 datasets and GA-SVM method
achieves in 5 out of 10 datasets. The GA with kNN and SVM
methods are demonstrated to be an accurate method for microarray
based tumor classification.
Abstract: It has become crucial over the years for nations to
improve their credit scoring methods and techniques in light of the
increasing volatility of the global economy. Statistical methods or
tools have been the favoured means for this; however artificial
intelligence or soft computing based techniques are becoming
increasingly preferred due to their proficient and precise nature and
relative simplicity. This work presents a comparison between Support
Vector Machines and Artificial Neural Networks two popular soft
computing models when applied to credit scoring. Amidst the
different criteria-s that can be used for comparisons; accuracy,
computational complexity and processing times are the selected
criteria used to evaluate both models. Furthermore the German credit
scoring dataset which is a real world dataset is used to train and test
both developed models. Experimental results obtained from our study
suggest that although both soft computing models could be used with
a high degree of accuracy, Artificial Neural Networks deliver better
results than Support Vector Machines.
Abstract: Power System Security is a major concern in real time
operation. Conventional method of security evaluation consists of
performing continuous load flow and transient stability studies by
simulation program. This is highly time consuming and infeasible
for on-line application. Pattern Recognition (PR) is a promising
tool for on-line security evaluation. This paper proposes a Support
Vector Machine (SVM) based binary classification for static and
transient security evaluation. The proposed SVM based PR approach
is implemented on New England 39 Bus and IEEE 57 Bus systems.
The simulation results of SVM classifier is compared with the other
classifier algorithms like Method of Least Squares (MLS), Multi-
Layer Perceptron (MLP) and Linear Discriminant Analysis (LDA)
classifiers.
Abstract: In this paper a novel method for finding the fault zone
on a Thyristor Controlled Series Capacitor (TCSC) incorporated
transmission line is presented. The method makes use of the Support
Vector Machine (SVM), used in the classification mode to
distinguish between the zones, before or after the TCSC. The use of
Discrete Wavelet Transform is made to prepare the features which
would be given as the input to the SVM. This method was tested on a
400 kV, 50 Hz, 300 Km transmission line and the results were highly
accurate.
Abstract: In this paper, we propose a robust disease detection
method, called adaptive orientation code matching (Adaptive OCM),
which is developed from a robust image registration algorithm:
orientation code matching (OCM), to achieve continuous and
site-specific detection of changes in plant disease. We use two-stage
framework for realizing our research purpose; in the first stage,
adaptive OCM was employed which could not only realize the
continuous and site-specific observation of disease development, but
also shows its excellent robustness for non-rigid plant object searching
in scene illumination, translation, small rotation and occlusion changes
and then in the second stage, a machine learning method of support
vector machine (SVM) based on a feature of two dimensional (2D)
xy-color histogram is further utilized for pixel-wise disease
classification and quantification. The indoor experiment results
demonstrate the feasibility and potential of our proposed algorithm,
which could be implemented in real field situation for better
observation of plant disease development.
Abstract: Term Extraction, a key data preparation step in Text
Mining, extracts the terms, i.e. relevant collocation of words,
attached to specific concepts (e.g. genetic-algorithms and decisiontrees
are terms associated to the concept “Machine Learning" ). In
this paper, the task of extracting interesting collocations is achieved
through a supervised learning algorithm, exploiting a few
collocations manually labelled as interesting/not interesting. From
these examples, the ROGER algorithm learns a numerical function,
inducing some ranking on the collocations. This ranking is optimized
using genetic algorithms, maximizing the trade-off between the false
positive and true positive rates (Area Under the ROC curve). This
approach uses a particular representation for the word collocations,
namely the vector of values corresponding to the standard statistical
interestingness measures attached to this collocation. As this
representation is general (over corpora and natural languages),
generality tests were performed by experimenting the ranking
function learned from an English corpus in Biology, onto a French
corpus of Curriculum Vitae, and vice versa, showing a good
robustness of the approaches compared to the state-of-the-art Support
Vector Machine (SVM).
Abstract: It is important to predict yield in semiconductor test process in order to increase yield. In this study, yield prediction means finding out defective die, wafer or lot effectively. Semiconductor test process consists of some test steps and each test includes various test items. In other world, test data has a big and complicated characteristic. It also is disproportionably distributed as the number of data belonging to FAIL class is extremely low. For yield prediction, general data mining techniques have a limitation without any data preprocessing due to eigen properties of test data. Therefore, this study proposes an under-sampling method using support vector machine (SVM) to eliminate an imbalanced characteristic. For evaluating a performance, randomly under-sampling method is compared with the proposed method using actual semiconductor test data. As a result, sampling method using SVM is effective in generating robust model for yield prediction.