Network Anomaly Detection using Soft Computing

One main drawback of intrusion detection system is the inability of detecting new attacks which do not have known signatures. In this paper we discuss an intrusion detection method that proposes independent component analysis (ICA) based feature selection heuristics and using rough fuzzy for clustering data. ICA is to separate these independent components (ICs) from the monitored variables. Rough set has to decrease the amount of data and get rid of redundancy and Fuzzy methods allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining- (KDDCup 1999) dataset.

Feature Selection with Kohonen Self Organizing Classification Algorithm

In this paper a one-dimension Self Organizing Map algorithm (SOM) to perform feature selection is presented. The algorithm is based on a first classification of the input dataset on a similarity space. From this classification for each class a set of positive and negative features is computed. This set of features is selected as result of the procedure. The procedure is evaluated on an in-house dataset from a Knowledge Discovery from Text (KDT) application and on a set of publicly available datasets used in international feature selection competitions. These datasets come from KDT applications, drug discovery as well as other applications. The knowledge of the correct classification available for the training and validation datasets is used to optimize the parameters for positive and negative feature extractions. The process becomes feasible for large and sparse datasets, as the ones obtained in KDT applications, by using both compression techniques to store the similarity matrix and speed up techniques of the Kohonen algorithm that take advantage of the sparsity of the input matrix. These improvements make it feasible, by using the grid, the application of the methodology to massive datasets.

Accurate Fault Classification and Section Identification Scheme in TCSC Compensated Transmission Line using SVM

This paper presents a new approach for the protection of Thyristor-Controlled Series Compensator (TCSC) line using Support Vector Machine (SVM). One SVM is trained for fault classification and another for section identification. This method use three phase current measurement that results in better speed and accuracy than other SVM based methods which used single phase current measurement. This makes it suitable for real-time protection. The method was tested on 10,000 data instances with a very wide variation in system conditions such as compensation level, source impedance, location of fault, fault inception angle, load angle at source bus and fault resistance. The proposed method requires only local current measurement.

Anomaly Detection using Neuro Fuzzy system

As the network based technologies become omnipresent, demands to secure networks/systems against threat increase. One of the effective ways to achieve higher security is through the use of intrusion detection systems (IDS), which are a software tool to detect anomalous in the computer or network. In this paper, an IDS has been developed using an improved machine learning based algorithm, Locally Linear Neuro Fuzzy Model (LLNF) for classification whereas this model is originally used for system identification. A key technical challenge in IDS and LLNF learning is the curse of high dimensionality. Therefore a feature selection phase is proposed which is applicable to any IDS. While investigating the use of three feature selection algorithms, in this model, it is shown that adding feature selection phase reduces computational complexity of our model. Feature selection algorithms require the use of a feature goodness measure. The use of both a linear and a non-linear measure - linear correlation coefficient and mutual information- is investigated respectively

Reducing SAGE Data Using Genetic Algorithms

Serial Analysis of Gene Expression is a powerful quantification technique for generating cell or tissue gene expression data. The profile of the gene expression of cell or tissue in several different states is difficult for biologists to analyze because of the large number of genes typically involved. However, feature selection in machine learning can successfully reduce this problem. The method allows reducing the features (genes) in specific SAGE data, and determines only relevant genes. In this study, we used a genetic algorithm to implement feature selection, and evaluate the classification accuracy of the selected features with the K-nearest neighbor method. In order to validate the proposed method, we used two SAGE data sets for testing. The results of this study conclusively prove that the number of features of the original SAGE data set can be significantly reduced and higher classification accuracy can be achieved.

Ant Colony Optimization for Feature Subset Selection

The Ant Colony Optimization (ACO) is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It has recently attracted a lot of attention and has been successfully applied to a number of different optimization problems. Due to the importance of the feature selection problem and the potential of ACO, this paper presents a novel method that utilizes the ACO algorithm to implement a feature subset search procedure. Initial results obtained using the classification of speech segments are very promising.

Feature Subset Selection Using Ant Colony Optimization

Feature selection is an important step in many pattern classification problems. It is applied to select a subset of features, from a much larger set, such that the selected subset is sufficient to perform the classification task. Due to its importance, the problem of feature selection has been investigated by many researchers. In this paper, a novel feature subset search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.

A Hybrid Feature Selection by Resampling, Chi squared and Consistency Evaluation Techniques

In this paper a combined feature selection method is proposed which takes advantages of sample domain filtering, resampling and feature subset evaluation methods to reduce dimensions of huge datasets and select reliable features. This method utilizes both feature space and sample domain to improve the process of feature selection and uses a combination of Chi squared with Consistency attribute evaluation methods to seek reliable features. This method consists of two phases. The first phase filters and resamples the sample domain and the second phase adopts a hybrid procedure to find the optimal feature space by applying Chi squared, Consistency subset evaluation methods and genetic search. Experiments on various sized datasets from UCI Repository of Machine Learning databases show that the performance of five classifiers (Naïve Bayes, Logistic, Multilayer Perceptron, Best First Decision Tree and JRIP) improves simultaneously and the classification error for these classifiers decreases considerably. The experiments also show that this method outperforms other feature selection methods.

Improving Worm Detection with Artificial Neural Networks through Feature Selection and Temporal Analysis Techniques

Computer worm detection is commonly performed by antivirus software tools that rely on prior explicit knowledge of the worm-s code (detection based on code signatures). We present an approach for detection of the presence of computer worms based on Artificial Neural Networks (ANN) using the computer's behavioral measures. Identification of significant features, which describe the activity of a worm within a host, is commonly acquired from security experts. We suggest acquiring these features by applying feature selection methods. We compare three different feature selection techniques for the dimensionality reduction and identification of the most prominent features to capture efficiently the computer behavior in the context of worm activity. Additionally, we explore three different temporal representation techniques for the most prominent features. In order to evaluate the different techniques, several computers were infected with five different worms and 323 different features of the infected computers were measured. We evaluated each technique by preprocessing the dataset according to each one and training the ANN model with the preprocessed data. We then evaluated the ability of the model to detect the presence of a new computer worm, in particular, during heavy user activity on the infected computers.