Abstract: Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.
Abstract: Classification is an interesting problem in functional
data analysis (FDA), because many science and application problems
end up with classification problems, such as recognition, prediction,
control, decision making, management, etc. As the high dimension
and high correlation in functional data (FD), it is a key problem to
extract features from FD whereas keeping its global characters, which
relates to the classification efficiency and precision to heavens. In this
paper, a novel automatic method which combined Genetic Algorithm
(GA) and classification algorithm to extract classification features is
proposed. In this method, the optimal features and classification model
are approached via evolutional study step by step. It is proved by
theory analysis and experiment test that this method has advantages in
improving classification efficiency, precision and robustness whereas
using less features and the dimension of extracted classification
features can be controlled.
Abstract: In this paper, we present an innovative scheme of
blindly extracting message bits from an image distorted by an attack.
Support Vector Machine (SVM) is used to nonlinearly classify the
bits of the embedded message. Traditionally, a hard decoder is used
with the assumption that the underlying modeling of the Discrete
Cosine Transform (DCT) coefficients does not appreciably change.
In case of an attack, the distribution of the image coefficients is
heavily altered. The distribution of the sufficient statistics at the
receiving end corresponding to the antipodal signals overlap and a
simple hard decoder fails to classify them properly. We are
considering message retrieval of antipodal signal as a binary
classification problem. Machine learning techniques like SVM is
used to retrieve the message, when certain specific class of attacks is
most probable. In order to validate SVM based decoding scheme, we
have taken Gaussian noise as a test case. We generate a data set using
125 images and 25 different keys. Polynomial kernel of SVM has
achieved 100 percent accuracy on test data.
Abstract: In this paper, we propose a new method to distinguish
between arousal and relaxation states by using multiple features
acquired from a photoplethysmogram (PPG) and support vector
machine (SVM). To induce arousal and relaxation states in subjects, 2
kinds of sound stimuli are used, and their corresponding biosignals are
obtained using the PPG sensor. Two features–pulse to pulse interval
(PPI) and pulse amplitude (PA)–are extracted from acquired PPG
data, and a nonlinear classification between arousal and relaxation is
performed using SVM.
This methodology has several advantages when compared with
previous similar studies. Firstly, we extracted 2 separate features from
PPG, i.e., PPI and PA. Secondly, in order to improve the classification
accuracy, SVM-based nonlinear classification was performed.
Thirdly, to solve classification problems caused by generalized
features of whole subjects, we defined each threshold according to
individual features.
Experimental results showed that the average classification
accuracy was 74.67%. Also, the proposed method showed the better
identification performance than the single feature based methods.
From this result, we confirmed that arousal and relaxation can be
classified using SVM and PPG features.
Abstract: Face authentication for access control is a face
membership authentication which passes the person of the incoming
face if he turns out to be one of an enrolled person based on face
recognition or rejects if not. Face membership authentication belongs
to the two class classification problem where SVM(Support Vector
Machine) has been successfully applied and shows better performance
compared to the conventional threshold-based classification. However,
most of previous SVMs have been trained using image feature vectors
extracted from face images of each class member(enrolled
class/unenrolled class) so that they are not robust to variations in
illuminations, poses, and facial expressions and much affected by
changes in member configuration of the enrolled class
In this paper, we propose an effective face membership
authentication method based on SVM using class discriminating
features which represent an incoming face image-s associability with
each class distinctively. These class discriminating features are weakly
related with image features so that they are less affected by variations
in illuminations, poses and facial expression.
Through experiments, it is shown that the proposed face
membership authentication method performs better than the threshold
rule-based or the conventional SVM-based authentication methods and
is relatively less affected by changes in member size and membership.
Abstract: Feature selection is an important step in many pattern
classification problems. It is applied to select a subset of features,
from a much larger set, such that the selected subset is sufficient to
perform the classification task. Due to its importance, the problem of
feature selection has been investigated by many researchers. In this
paper, a novel feature subset search procedure that utilizes the Ant
Colony Optimization (ACO) is presented. The ACO is a
metaheuristic inspired by the behavior of real ants in their search for
the shortest paths to food sources. It looks for optimal solutions by
considering both local heuristics and previous knowledge. When
applied to two different classification problems, the proposed
algorithm achieved very promising results.
Abstract: In this paper two models using a functional network
were employed to solving classification problem. Functional networks
are generalized neural networks, which permit the specification of
their initial topology using knowledge about the problem at hand. In
this case, and after analyzing the available data and their relations, we
systematically discuss a numerical analysis method used for
functional network, and apply two functional network models to
solving XOR problem. The XOR problem that cannot be solved with
two-layered neural network can be solved by two-layered functional
network, which reveals a potent computational power of functional
networks, and the performance of the proposed model was validated
using classification problems.
Abstract: Although backpropagation ANNs generally predict
better than decision trees do for pattern classification problems, they
are often regarded as black boxes, i.e., their predictions cannot be
explained as those of decision trees. In many applications, it is
desirable to extract knowledge from trained ANNs for the users to
gain a better understanding of how the networks solve the problems.
A new rule extraction algorithm, called rule extraction from artificial
neural networks (REANN) is proposed and implemented to extract
symbolic rules from ANNs. A standard three-layer feedforward ANN
is the basis of the algorithm. A four-phase training algorithm is
proposed for backpropagation learning. Explicitness of the extracted
rules is supported by comparing them to the symbolic rules generated
by other methods. Extracted rules are comparable with other methods
in terms of number of rules, average number of conditions for a rule,
and predictive accuracy. Extensive experimental studies on several
benchmarks classification problems, such as breast cancer, iris,
diabetes, and season classification problems, demonstrate the
effectiveness of the proposed approach with good generalization
ability.
Abstract: Ensemble learning algorithms such as AdaBoost and
Bagging have been in active research and shown improvements in
classification results for several benchmarking data sets with mainly
decision trees as their base classifiers. In this paper we experiment to
apply these Meta learning techniques with classifiers such as random
forests, neural networks and support vector machines. The data sets
are from MAGIC, a Cherenkov telescope experiment. The task is to
classify gamma signals from overwhelmingly hadron and muon
signals representing a rare class classification problem. We compare
the individual classifiers with their ensemble counterparts and
discuss the results. WEKA a wonderful tool for machine learning has
been used for making the experiments.
Abstract: In this article we are going to discuss the improvement
of the multi classes- classification problem using multi layer
Perceptron. The considered approach consists in breaking down the
n-class problem into two-classes- subproblems. The training of each
two-class subproblem is made independently; as for the phase of test,
we are going to confront a vector that we want to classify to all two
classes- models, the elected class will be the strongest one that won-t
lose any competition with the other classes. Rates of recognition
gotten with the multi class-s approach by two-class-s decomposition
are clearly better that those gotten by the simple multi class-s
approach.
Abstract: The healthcare environment is generally perceived as
being information rich yet knowledge poor. However, there is a lack
of effective analysis tools to discover hidden relationships and trends
in data. In fact, valuable knowledge can be discovered from
application of data mining techniques in healthcare system. In this
study, a proficient methodology for the extraction of significant
patterns from the Coronary Heart Disease warehouses for heart
attack prediction, which unfortunately continues to be a leading cause
of mortality in the whole world, has been presented. For this purpose,
we propose to enumerate dynamically the optimal subsets of the
reduced features of high interest by using rough sets technique
associated to dynamic programming. Therefore, we propose to
validate the classification using Random Forest (RF) decision tree to
identify the risky heart disease cases. This work is based on a large
amount of data collected from several clinical institutions based on
the medical profile of patient. Moreover, the experts- knowledge in
this field has been taken into consideration in order to define the
disease, its risk factors, and to establish significant knowledge
relationships among the medical factors. A computer-aided system is
developed for this purpose based on a population of 525 adults. The
performance of the proposed model is analyzed and evaluated based
on set of benchmark techniques applied in this classification problem.
Abstract: Support vector machines (SVMs) are considered to be
the best machine learning algorithms for minimizing the predictive
probability of misclassification. However, their drawback is that for
large data sets the computation of the optimal decision boundary is a
time consuming function of the size of the training set. Hence several
methods have been proposed to speed up the SVM algorithm. Here
three methods used to speed up the computation of the SVM
classifiers are compared experimentally using a musical genre
classification problem. The simplest method pre-selects a random
sample of the data before the application of the SVM algorithm. Two
additional methods use proximity graphs to pre-select data that are
near the decision boundary. One uses k-Nearest Neighbor graphs and
the other Relative Neighborhood Graphs to accomplish the task.
Abstract: Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems, training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a proposed back propagation neural net classifier that performs cross validation for original Neural Network. In order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed approach are demonstrated by means of five data sets like contact-lenses, cpu, weather symbolic, Weather, labor-nega-data. It is shown that , compared to exiting neural network, the training time is reduced by more than 10 times faster when the dataset is larger than CPU or the network has many hidden units while accuracy ('percent correct') was the same for all datasets but contact-lences, which is the only one with missing attributes. For contact-lences the accuracy with Proposed Neural Network was in average around 0.3 % less than with the original Neural Network. This algorithm is independent of specify data sets so that many ideas and solutions can be transferred to other classifier paradigms.
Abstract: In this paper, a clustering algorithm named KHarmonic
means (KHM) was employed in the training of Radial
Basis Function Networks (RBFNs). KHM organized the data in
clusters and determined the centres of the basis function. The popular
clustering algorithms, namely K-means (KM) and Fuzzy c-means
(FCM), are highly dependent on the initial identification of elements
that represent the cluster well. In KHM, the problem can be avoided.
This leads to improvement in the classification performance when
compared to other clustering algorithms. A comparison of the
classification accuracy was performed between KM, FCM and KHM.
The classification performance is based on the benchmark data sets:
Iris Plant, Diabetes and Breast Cancer. RBFN training with the KHM
algorithm shows better accuracy in classification problem.
Abstract: This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to solve the Web pages classification problems. We apply Inductive logic programming (ILP) as a strong learner in ICT. The objective of this research is to evaluate the potential of the strong learner in order to boost the performance of the weak learner of ICT. We compare the result with the supervised Naive Bayes, which is the well-known algorithm for the text classification problem. The performance of our learning algorithm is also compare with other semi-supervised learning algorithms which are Co-Training and EM. The experimental results show that ICT algorithm outperforms those algorithms and the performance of the weak learner can be enhanced by ILP system.
Abstract: The purpose of this paper is to demonstrate the ability
of a genetic programming (GP) algorithm to evolve a team of data
classification models. The GP algorithm used in this work is
“multigene" in nature, i.e. there are multiple tree structures (genes)
that are used to represent team members. Each team member assigns
a data sample to one of a fixed set of output classes. A majority vote,
determined using the mode (highest occurrence) of classes predicted
by the individual genes, is used to determine the final class
prediction. The algorithm is tested on a binary classification problem.
For the case study investigated, compact classification models are
obtained with comparable accuracy to alternative approaches.
Abstract: The paper discusses the mathematics of pattern
indexing and its applications to recognition of visual patterns that are
found in video clips. It is shown that (a) pattern indexes can be
represented by collections of inverted patterns, (b) solutions to
pattern classification problems can be found as intersections and
histograms of inverted patterns and, thus, matching of original
patterns avoided.
Abstract: Support vector machines (SVMs) have shown
superior performance compared to other machine learning techniques,
especially in classification problems. Yet one limitation of SVMs is
the lack of an explanation capability which is crucial in some
applications, e.g. in the medical and security domains. In this paper, a
novel approach for eclectic rule-extraction from support vector
machines is presented. This approach utilizes the knowledge acquired
by the SVM and represented in its support vectors as well as the
parameters associated with them. The approach includes three stages;
training, propositional rule-extraction and rule quality evaluation.
Results from four different experiments have demonstrated the value
of the approach for extracting comprehensible rules of high accuracy
and fidelity.
Abstract: This paper describes a new approach of classification
using genetic programming. The proposed technique consists of
genetically coevolving a population of non-linear transformations on
the input data to be classified, and map them to a new space with a
reduced dimension, in order to get a maximum inter-classes
discrimination. The classification of new samples is then performed
on the transformed data, and so become much easier. Contrary to the
existing GP-classification techniques, the proposed one use a
dynamic repartition of the transformed data in separated intervals, the
efficacy of a given intervals repartition is handled by the fitness
criterion, with a maximum classes discrimination. Experiments were
first performed using the Fisher-s Iris dataset, and then, the KDD-99
Cup dataset was used to study the intrusion detection and
classification problem. Obtained results demonstrate that the
proposed genetic approach outperform the existing GP-classification
methods [1],[2] and [3], and give a very accepted results compared to
other existing techniques proposed in [4],[5],[6],[7] and [8].
Abstract: The present study presents a new approach to automatic
data clustering and classification problems in large and complex
databases and, at the same time, derives specific types of explicit rules
describing each cluster. The method works well in both sparse and
dense multidimensional data spaces. The members of the data space
can be of the same nature or represent different classes. A number
of N-dimensional ellipsoids are used for enclosing the data clouds.
Due to the geometry of an ellipsoid and its free rotation in space
the detection of clusters becomes very efficient. The method is based
on genetic algorithms that are used for the optimization of location,
orientation and geometric characteristics of the hyper-ellipsoids. The
proposed approach can serve as a basis for the development of
general knowledge systems for discovering hidden knowledge and
unexpected patterns and rules in various large databases.