Performance Analysis of Genetic Algorithm with kNN and SVM for Feature Selection in Tumor Classification

Tumor classification is a key area of research in the
field of bioinformatics. Microarray technology is commonly used in
the study of disease diagnosis using gene expression levels. The
main drawback of gene expression data is that it contains thousands
of genes and a very few samples. Feature selection methods are used
to select the informative genes from the microarray. These methods
considerably improve the classification accuracy. In the proposed
method, Genetic Algorithm (GA) is used for effective feature
selection. Informative genes are identified based on the T-Statistics,
Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate
solutions of GA are obtained from top-m informative genes. The
classification accuracy of k-Nearest Neighbor (kNN) method is used
as the fitness function for GA. In this work, kNN and Support Vector
Machine (SVM) are used as the classifiers. The experimental results
show that the proposed work is suitable for effective feature
selection. With the help of the selected genes, GA-kNN method
achieves 100% accuracy in 4 datasets and GA-SVM method
achieves in 5 out of 10 datasets. The GA with kNN and SVM
methods are demonstrated to be an accurate method for microarray
based tumor classification.





References:
[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P.
Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D.
Bloomfield, and E.S. Lander, "Molecular Classification of Cancer:
Class Discovery and Class Prediction by Gene Expression Monitoring,”
Science, vol. 286, no. 5439, pp. 531 – 537, 1999.
[2] E. Domany, "Cluster analysis of gene expression data,” J Stat Phys, vol.
110, pp. 1117-1139, 2003.
[3] D.E. Goldberg, Genetic Algorithms-in Search, Optimization and
Machine Learning. London: Addison-Wesley Publishing Company Inc,
1989.
[4] J. Holland, Adaption in Natural and Artificial Systems. University of
Michigan Press, Ann Arbor, MI, 1975.
[5] T. Umpai and A. Stuart, "Feature selection and classification for
microarray data analysis: Evolutionary methods for identifying
predictive genes,” BMC Bioinformatics, vol. 6, no. 148, 2005.
[6] S. Vanichayobon, W. Siriphan, and W. Wiphada, "Microarray Gene
Selection Using Self-Organizing Map,” in Proceedings of the seventh
WSEAS International Conference on Simulation, Modelling and
Optimization, Beijing, China, 2007.
[7] X. Wang and O. Gotoh, "Accurate molecular classification of cancer
using simple rules,” BMC Medical Genomics, vol. 2, no. 64, 2009.
[8] E. Martinez, M.A. Mario, and T. Victor, "Compact cancer biomarkers
discovery using a swarm intelligence feature selection algorithm,”
Computational Biology and Chemistry, vol. 34, pp. 244 – 250, 2010.
[9] P. Chopra, J. Lee, J. Kang, and S. Lee, "Improving Cancer
Classification Accuracy Using Gene Pairs,” PLoS ONE, vol. 5, no. 12,
2010.
[10] H. Liu, L. Lei, and H. Zhang, "Ensemble gene selection for cancer
classification,” Pattern Recognition, vol. 43, pp. 2763 – 2772, 2010.
[11] C. Li-Yeh, Y. Cheng-San, W. Kuo-Chuan, and Y. Cheng-Hong, "Gene
selection and classification using Taguchi chaotic binary particle swarm
optimization,” Expert Systems with Applications, vol. 38, pp. 13367 –
13377, 2011.
[12] O. Dagliyan, F. Uney-Yuksektepe, I.H. Kavakli, and M. Turkay,
"Optimization Based Tumor Classification from Microarray Gene
Expression Data,” PLoS ONE, vol. 6, no. 2, 2011.
[13] X. Wang and R. Simon,” Microarray-based cancer prediction using
single Genes,” BMC Bioinformatics, vol. 12, no. 391, 2011.
[14] B. Chandra and M. Gupta, "An efficient statistical feature selection for
classification of gene expression data,” Journal of Biomedical
Informatics, vol. 44, pp. 529 – 535, 2011.
[15] I.H. Lee, H.L. Gerald, and V. Mahesh, "A filter-based feature selection
approach for identifying potential biomarkers for lung cancer,” Journal
of Clinical Bioinformatics, vol. 1, no. 11, 2011.
[16] H.D. Li, Y.Z. Liang, Q.S. Xu, D.S. Cao, B.B. Tan, B.C. Deng, and C.C.
Lin, "Recipe for Uncovering Predictive Genes Using Support Vector
Machines Based on Model Population Analysis,” IEEE/ACM
Transactions on Computational Biology and Bioinformatics, vol. 8, no.
6, pp. 1633 – 1641, 2011.
[17] D. Mishra and B. Sahu, "Feature Selection for Cancer Classification: A
Signal-to-noise Ratio Approach,” International Journal of Scientific &
Engineering Research, vol. 2, no. 4, 2011.
[18] H. Huang, J. Li, and J. Liu, "Gene expression data classification based
on improved semi-supervised local Fisher discriminant analysis,”
Expert Systems with Applications, vol. 39, pp. 2314 – 2320, 2012.
[19] G.C.J. Alonso, I.Q. Moro-Sancho, A. Simon-Hurtado, and R. Varela-
Arrabal, "Microarray gene expression classification with few genes:
Criteria to combine attribute selection and classification methods,”
Expert Systems with Applications, vol. 39, pp. 7270 –7280, 2012.
[20] C.Gunavathi and K.Premalatha, "A Comparative Analysis of Swarm
Intelligence Techniques for Feature Selection in Cancer Classification”
The Scientific World Journal, vol. 2014, Article ID 693831,
http://dx.doi.org/10.1155/2014/693831.
[21] A. Sharma, I. Seiya, and M. Satoru, "A Top-R Feature Selection
Algorithm For Microarray Gene Expression Data,” IEEE/ACM
Transactions on Computational Biology and Bioinformatics, vol. 9, no.
3, pp. 754 – 764, 2012.
[22] K. Yendrapalli, R. Basnet, S. Mukkamala, and A.H. Sung, "Gene
Selection for Tumor Classification Using Microarray Gene Expression
Data,” in Proceedings of the World Congress on Engineering, vol. I,
2007.
[23] X. Momiao, W. Li, J. Zhao, J. Li, and B. Eric, "Feature (Gene)
Selection in Gene Expression-Based Tumor Classification,” Journal of
Molecular Genetics and Metabolism, vol. 73, pp. 239–247, 2001.
[24] C. Cortes and V. Vapnik, "Support-vector networks,” Mach Learn, vol.
20, no. 3, pp.273–297, 1995.
[25] M.S. Mohamed, D. Safaai, and R.O. Muhammad, "Genetic Algorithms
wrapper approach to select informative genes for gene expression
microarray classification using support vector machines,” in InCoB'04:
Proceedings of Third International Conference on Bioinformatics,
Auckland, New Zealand, 2004.
[26] N.S. Altman, "An introduction to kernel and nearest-neighbor
nonparametric regression,” The American Statistician, vol. 46, no. 3, pp.
175-185, 1992.