Abstract: Feature Selection is significant in order to perform constructive classification in the area of cancer diagnosis. However, a large number of features compared to the number of samples makes the task of classification computationally very hard and prone to errors in microarray gene expression datasets. In this paper, we present an innovative method for selecting highly informative gene subsets of gene expression data that effectively classifies the cancer data into tumorous and non-tumorous. The hybrid gene selection technique comprises of combined Mutual Information and Fisher score to select informative genes. The gene selection is validated by classification using Support Vector Machine (SVM) which is a supervised learning algorithm capable of solving complex classification problems. The results obtained from improved Mutual Information and F-Score with SVM as a classifier has produced efficient results.
Abstract: A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.
Abstract: Microarray technology is universally used in the study
of disease diagnosis using gene expression levels. The main
shortcoming of gene expression data is that it includes thousands of
genes and a small number of samples. Abundant methods and
techniques have been proposed for tumor classification using
microarray gene expression data. Feature or gene selection methods
can be used to mine the genes that directly involve in the
classification and to eliminate irrelevant genes. In this paper
statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR)
and F-Statistics are used to rank the genes. The ranked genes are used
for further classification. Particle Swarm Optimization (PSO)
algorithm and Shuffled Frog Leaping (SFL) algorithm are used to
find the significant genes from the top-m ranked genes. The Naïve
Bayes Classifier (NBC) is used to classify the samples based on the
significant genes. The proposed work is applied on Lung and Ovarian
datasets. The experimental results show that the proposed method
achieves 100% accuracy in all the three datasets and the results are
compared with previous works.
Abstract: Microarray gene expression data play a vital in biological processes, gene regulation and disease mechanism. Biclustering in gene expression data is a subset of the genes indicating consistent patterns under the subset of the conditions. Finding a biclustering is an optimization problem. In recent years, swarm intelligence techniques are popular due to the fact that many real-world problems are increasingly large, complex and dynamic. By reasons of the size and complexity of the problems, it is necessary to find an optimization technique whose efficiency is measured by finding the near optimal solution within a reasonable amount of time. In this paper, the algorithmic concepts of the Particle Swarm Optimization (PSO), Shuffled Frog Leaping (SFL) and Cuckoo Search (CS) algorithms have been analyzed for the four benchmark gene expression dataset. The experiment results show that CS outperforms PSO and SFL for 3 datasets and SFL give better performance in one dataset. Also this work determines the biological relevance of the biclusters with Gene Ontology in terms of function, process and component.
Abstract: An evolutionary method whose selection and recombination
operations are based on generalization error-bounds of
support vector machine (SVM) can select a subset of potentially
informative genes for SVM classifier very efficiently [7]. In this
paper, we will use the derivative of error-bound (first-order criteria)
to select and recombine gene features in the evolutionary process,
and compare the performance of the derivative of error-bound with
the error-bound itself (zero-order) in the evolutionary process. We
also investigate several error-bounds and their derivatives to compare
the performance, and find the best criteria for gene selection
and classification. We use 7 cancer-related human gene expression
datasets to evaluate the performance of the zero-order and first-order
criteria of error-bounds. Though both criteria have the same strategy
in theoretically, experimental results demonstrate the best criterion
for microarray gene expression data.
Abstract: This paper gives a novel method for improving
classification performance for cancer classification with very few
microarray Gene expression data. The method employs classification
with individual gene ranking and gene subset ranking. For selection
and classification, the proposed method uses the same classifier. The
method is applied to three publicly available cancer gene expression
datasets from Lymphoma, Liver and Leukaemia datasets. Three
different classifiers namely Support vector machines-one against all
(SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant
analysis (LDA) were tested and the results indicate the improvement
in performance of SVM-OAA classifier with satisfactory results on
all the three datasets when compared with the other two classifiers.
Abstract: The most common result of analysis of highthroughput
data in molecular biology represents a global list of
genes, ranked accordingly to a certain score. The score can be a
measure of differential expression. Recent work proposed a new
method for selecting a number of genes in a ranked gene list from
microarray gene expression data such that this set forms the
Optimally Functionally Enriched Network (OFTEN), formed by
known physical interactions between genes or their products. Here
we present calculation results of relative connectivity of genes from
META-OFTEN network and tentative biological interpretation of the
most reproducible signal. The relative connectivity and
inbetweenness values of genes from META-OFTEN network were
estimated.