An SVM based Classification Method for Cancer Data using Minimum Microarray Gene Expressions

This paper gives a novel method for improving classification performance for cancer classification with very few microarray Gene expression data. The method employs classification with individual gene ranking and gene subset ranking. For selection and classification, the proposed method uses the same classifier. The method is applied to three publicly available cancer gene expression datasets from Lymphoma, Liver and Leukaemia datasets. Three different classifiers namely Support vector machines-one against all (SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant analysis (LDA) were tested and the results indicate the improvement in performance of SVM-OAA classifier with satisfactory results on all the three datasets when compared with the other two classifiers.




References:
[1] Chih-wei Hsu and chih jen Lin.2002, A Comparison of methods for
multiclass Support vector machines, IEEE transactions on neural
networks.
[2] Dudoit, S., Fridlyand, J., & T. P. Speed.2002. Comparison of
discrimination methods for the classification of tumor using gene
expression data. Journal of the American Statistical Association, 97,
77-87.
[3] Elena Marchiori, Michele Sebag.2005. Bayesian learning with local
support vector Machines for cancer classification with gene expression
data", Evo Workshops PP.74-83.
[4] T. R. Golub, D. K. Slonim and P. Tamayo et al.1999. Molecular
classification of cancer:class discovery and prediction by gene
expression, monitoring Science, 286:531-7.
[5] Juan Liu, Hitoshi Iba.2001. Selecting Informative Genes with Parallel
Genetic algorithms in Tissue Classification, Genome Informatics 12: 14-
23.
[6] Li, Dardern, Weinberg, Levine, and Pedersen.2001. Gene assessment
and sample classification for gene expression data using genetic
algorithm/k-nearestneighbour method.Combinatorial Chemistry and
High Throughput Screening,4(8):727-739.
[7] Lipo Wang, Feng Chu, and Wei Xie .2007. Accurate Cancer
Classification Using Expressions of Very Few Genes, IEEE/ACM
Transactions on computational biology and bioinformatics, vol. 4, no. 1,
January-march.
[8] Li-Yeh Chuang,Chao-Hsuan Ke,Hsueh-Wei Chang,Cheng-hong Yang
2009. A two-stage Feature selection method for gene expression data,
OMICS A journal of Integrative Biology, Volume.13, number 2.
[9] Mao Yong,Zhou Xiao-bo,PI Dao-ying,Sun You-xian et
al.2005.Parameters selection in gene selection using Gaussian Kernel
support vector machines by genetic algorithm, Journal of Zhejiang
University SCIENCE 6B(10):961-973.
[10] Mingjun Song, Sanguthevar Rajasekaran 2007. A greedy correlationincorporated
SVM-based Algorithm for gene selection, 21st
International Conference on Advanced Information Networking and
Applications workshop, IEEE.
[11] R. Tibshirani, T. Hastie, B. Narasimhan, G. Chu .2003. Class Prediction
by Nearest Shrunken Centroids with Applications to DNA Microarrays,
Statistical Science, vol. 18, pp. 104-117.
[12] O.Troyanskaya.2001. Missing values estimation methods for DNA
Microarrays, Bioinformatics, vol.17, pp. 520-525.
[13] Tzu-Tsung Wong, Ching-Han Hsu.2008.Two-stage classification
methods for microarray data, Science Direct, Expert Systems with
Applications 34(2008) 375-383.
[14] Yoonkyung Lee, Cheo Koo Lee .2003. Classification of multiple cancer
types by Multicategory support vector machines using gene expression
data, Vol. 19 no. 9, Bioinformatics.