Genetic Algorithm for Feature Subset Selection with Exploitation of Feature Correlations from Continuous Wavelet Transform: a real-case Application

A genetic algorithm (GA) based feature subset selection algorithm is proposed in which the correlation structure of the features is exploited. The subset of features is validated according to the classification performance. Features derived from the continuous wavelet transform are potentially strongly correlated. GA-s that do not take the correlation structure of features into account are inefficient. The proposed algorithm forms clusters of correlated features and searches for a good candidate set of clusters. Secondly a search within the clusters is performed. Different simulations of the algorithm on a real-case data set with strong correlations between features show the increased classification performance. Comparison is performed with a standard GA without use of the correlation structure.

Gene Selection Guided by Feature Interdependence

Cancers could normally be marked by a number of differentially expressed genes which show enormous potential as biomarkers for a certain disease. Recent years, cancer classification based on the investigation of gene expression profiles derived by high-throughput microarrays has widely been used. The selection of discriminative genes is, therefore, an essential preprocess step in carcinogenesis studies. In this paper, we have proposed a novel gene selector using information-theoretic measures for biological discovery. This multivariate filter is a four-stage framework through the analyses of feature relevance, feature interdependence, feature redundancy-dependence and subset rankings, and having been examined on the colon cancer data set. Our experimental result show that the proposed method outperformed other information theorem based filters in all aspect of classification errors and classification performance.