Abstract: One main drawback of intrusion detection system is the
inability of detecting new attacks which do not have known
signatures. In this paper we discuss an intrusion detection method
that proposes independent component analysis (ICA) based feature
selection heuristics and using rough fuzzy for clustering data. ICA is
to separate these independent components (ICs) from the monitored
variables. Rough set has to decrease the amount of data and get rid of
redundancy and Fuzzy methods allow objects to belong to several
clusters simultaneously, with different degrees of membership. Our
approach allows us to recognize not only known attacks but also to
detect activity that may be the result of a new, unknown attack. The
experimental results on Knowledge Discovery and Data Mining-
(KDDCup 1999) dataset.
Abstract: In this paper a one-dimension Self Organizing Map
algorithm (SOM) to perform feature selection is presented. The
algorithm is based on a first classification of the input dataset on a
similarity space. From this classification for each class a set of
positive and negative features is computed. This set of features is
selected as result of the procedure. The procedure is evaluated on an
in-house dataset from a Knowledge Discovery from Text (KDT)
application and on a set of publicly available datasets used in
international feature selection competitions. These datasets come
from KDT applications, drug discovery as well as other applications.
The knowledge of the correct classification available for the training
and validation datasets is used to optimize the parameters for positive
and negative feature extractions. The process becomes feasible for
large and sparse datasets, as the ones obtained in KDT applications,
by using both compression techniques to store the similarity matrix
and speed up techniques of the Kohonen algorithm that take
advantage of the sparsity of the input matrix. These improvements
make it feasible, by using the grid, the application of the
methodology to massive datasets.
Abstract: This paper presents a new approach for the protection
of Thyristor-Controlled Series Compensator (TCSC) line using
Support Vector Machine (SVM). One SVM is trained for fault
classification and another for section identification. This method use
three phase current measurement that results in better speed and
accuracy than other SVM based methods which used single phase
current measurement. This makes it suitable for real-time protection.
The method was tested on 10,000 data instances with a very wide
variation in system conditions such as compensation level, source
impedance, location of fault, fault inception angle, load angle at
source bus and fault resistance. The proposed method requires only
local current measurement.
Abstract: As the network based technologies become
omnipresent, demands to secure networks/systems against threat
increase. One of the effective ways to achieve higher security is
through the use of intrusion detection systems (IDS), which are a
software tool to detect anomalous in the computer or network. In this
paper, an IDS has been developed using an improved machine
learning based algorithm, Locally Linear Neuro Fuzzy Model
(LLNF) for classification whereas this model is originally used for
system identification. A key technical challenge in IDS and LLNF
learning is the curse of high dimensionality. Therefore a feature
selection phase is proposed which is applicable to any IDS. While
investigating the use of three feature selection algorithms, in this
model, it is shown that adding feature selection phase reduces
computational complexity of our model. Feature selection algorithms
require the use of a feature goodness measure. The use of both a
linear and a non-linear measure - linear correlation coefficient and
mutual information- is investigated respectively
Abstract: Serial Analysis of Gene Expression is a powerful
quantification technique for generating cell or tissue gene expression
data. The profile of the gene expression of cell or tissue in several
different states is difficult for biologists to analyze because of the large
number of genes typically involved. However, feature selection in
machine learning can successfully reduce this problem. The method
allows reducing the features (genes) in specific SAGE data, and
determines only relevant genes. In this study, we used a genetic
algorithm to implement feature selection, and evaluate the
classification accuracy of the selected features with the K-nearest
neighbor method. In order to validate the proposed method, we used
two SAGE data sets for testing. The results of this study conclusively
prove that the number of features of the original SAGE data set can be
significantly reduced and higher classification accuracy can be
achieved.
Abstract: The Ant Colony Optimization (ACO) is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It has recently attracted a lot of attention and has been successfully applied to a number of different optimization problems. Due to the importance of the feature selection problem and the potential of ACO, this paper presents a novel method that utilizes the ACO algorithm to implement a feature subset search procedure. Initial results obtained using the classification of speech segments are very promising.
Abstract: Feature selection is an important step in many pattern
classification problems. It is applied to select a subset of features,
from a much larger set, such that the selected subset is sufficient to
perform the classification task. Due to its importance, the problem of
feature selection has been investigated by many researchers. In this
paper, a novel feature subset search procedure that utilizes the Ant
Colony Optimization (ACO) is presented. The ACO is a
metaheuristic inspired by the behavior of real ants in their search for
the shortest paths to food sources. It looks for optimal solutions by
considering both local heuristics and previous knowledge. When
applied to two different classification problems, the proposed
algorithm achieved very promising results.
Abstract: In this paper a combined feature selection method is
proposed which takes advantages of sample domain filtering,
resampling and feature subset evaluation methods to reduce
dimensions of huge datasets and select reliable features. This method
utilizes both feature space and sample domain to improve the process
of feature selection and uses a combination of Chi squared with
Consistency attribute evaluation methods to seek reliable features.
This method consists of two phases. The first phase filters and
resamples the sample domain and the second phase adopts a hybrid
procedure to find the optimal feature space by applying Chi squared,
Consistency subset evaluation methods and genetic search.
Experiments on various sized datasets from UCI Repository of
Machine Learning databases show that the performance of five
classifiers (Naïve Bayes, Logistic, Multilayer Perceptron, Best First
Decision Tree and JRIP) improves simultaneously and the
classification error for these classifiers decreases considerably. The
experiments also show that this method outperforms other feature
selection methods.
Abstract: Computer worm detection is commonly performed by
antivirus software tools that rely on prior explicit knowledge of the
worm-s code (detection based on code signatures). We present an
approach for detection of the presence of computer worms based on
Artificial Neural Networks (ANN) using the computer's behavioral
measures. Identification of significant features, which describe the
activity of a worm within a host, is commonly acquired from security
experts. We suggest acquiring these features by applying feature
selection methods. We compare three different feature selection
techniques for the dimensionality reduction and identification of the
most prominent features to capture efficiently the computer behavior
in the context of worm activity. Additionally, we explore three
different temporal representation techniques for the most prominent
features. In order to evaluate the different techniques, several
computers were infected with five different worms and 323 different
features of the infected computers were measured. We evaluated
each technique by preprocessing the dataset according to each one
and training the ANN model with the preprocessed data. We then
evaluated the ability of the model to detect the presence of a new
computer worm, in particular, during heavy user activity on the
infected computers.