Statistical Measures and Optimization Algorithms for Gene Selection in Lung and Ovarian Tumor

Microarray technology is universally used in the study
of disease diagnosis using gene expression levels. The main
shortcoming of gene expression data is that it includes thousands of
genes and a small number of samples. Abundant methods and
techniques have been proposed for tumor classification using
microarray gene expression data. Feature or gene selection methods
can be used to mine the genes that directly involve in the
classification and to eliminate irrelevant genes. In this paper
statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR)
and F-Statistics are used to rank the genes. The ranked genes are used
for further classification. Particle Swarm Optimization (PSO)
algorithm and Shuffled Frog Leaping (SFL) algorithm are used to
find the significant genes from the top-m ranked genes. The Naïve
Bayes Classifier (NBC) is used to classify the samples based on the
significant genes. The proposed work is applied on Lung and Ovarian
datasets. The experimental results show that the proposed method
achieves 100% accuracy in all the three datasets and the results are
compared with previous works.





References:
[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P.
Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D.
Bloomfield, E.S. Lander, “Molecular Classification of Cancer: Class
Discovery and Class Prediction by Gene Expression Monitoring,”
Science, Vol. 286, no. 5439, pp. 531 - 537, 1999.
[2] E. Domany, “Cluster analysis of gene expression data,” J Stat Phys, vol.
110, pp. 1117-1139, 2003.
[3] T. Umpai, A. Stuart, “Feature selection and classification for microarray
data analysis: Evolutionary methods for identifying predictive genes,”
BMC Bioinformatics, vol. 6, no. 148, 2005.
[4] S. Vanichayobon, W. Siriphan, and W. Wiphada, “Microarray Gene
Selection Using Self-Organizing Map,” in Proceedings of the seventh
WSEAS International Conference on Simulation, Modelling and
Optimization, Beijing, China, 2007.
[5] X. Wang and O. Gotoh, “Accurate molecular classification of cancer
using simple rules,” BMC Medical Genomics, vol. 2, no. 64, 2009.
[6] E. Martinez, M.A. Mario, and T. Victor, “Compact cancer biomarkers
discovery using a swarm intelligence feature selection algorithm,”
Computational Biology and Chemistry, vol. 34, pp. 244 – 250, 2010.
[7] P. Chopra, J. Lee, J. Kang, and S. Lee, “Improving Cancer Classification
Accuracy Using Gene Pairs,” PLoS ONE, vol. 5, no. 12, 2010.
[8] H. Liu, L. Lei, and H. Zhang, “Ensemble gene selection for cancer
classification,” Pattern Recognition, vol. 43, pp. 2763 – 2772, 2010.
[9] C. Li-Yeh, Y. Cheng-San, W. Kuo-Chuan, and Y. Cheng-Hong, “Gene
selection and classification using Taguchi chaotic binary particle swarm optimization,” Expert Systems with Applications, vol. 38, pp. 13367 –
13377, 2011.
[10] O. Dagliyan, F. Uney-Yuksektepe, I.H. Kavakli, and M. Turkay,
“Optimization Based Tumor Classification from Microarray Gene
Expression Data,” PLoS ONE, vol. 6, no. 2, 2011.
[11] X. Wang and R. Simon,” Microarray-based cancer prediction using
single Genes,” BMC Bioinformatics, vol. 12, no. 391, 2011.
[12] B. Chandra and M. Gupta, “An efficient statistical feature selection for
classification of gene expression data,” Journal of Biomedical
Informatics, vol. 44, pp. 529 – 535, 2011.
[13] I.H. Lee, H.L. Gerald, and V. Mahesh, “A filter-based feature selection
approach for identifying potential biomarkers for lung cancer,” Journal
of Clinical Bioinformatics, vol. 1, no. 11, 2011.
[14] H.D. Li, Y.Z. Liang, Q.S. Xu, D.S. Cao, B.B. Tan, B.C. Deng, and C.C.
Lin, “Recipe for Uncovering Predictive Genes Using Support Vector
Machines Based on Model Population Analysis,” IEEE/ACM
Transactions on Computational Biology and Bioinformatics, vol. 8, no.
6, pp. 1633 – 1641, 2011.
[15] D. Mishra and B. Sahu, “Feature Selection for Cancer Classification: A
Signal-to-noise Ratio Approach,” International Journal of Scientific &
Engineering Research, vol. 2, no. 4, 2011.
[16] H. Huang, J. Li, and J. Liu, “Gene expression data classification based
on improved semi-supervised local Fisher discriminant analysis,”
Expert Systems with Applications, vol. 39, pp. 2314 – 2320, 2012.
[17] G.C.J. Alonso, I.Q. Moro-Sancho, A. Simon-Hurtado, and R. Varela-
Arrabal, “Microarray gene expression classification with few genes:
Criteria to combine attribute selection and classification methods,”
Expert Systems with Applications, vol. 39, pp. 7270 –7280, 2012.
[18] M. Pradipta, “Mutual Information-Based Supervised Attribute
Clustering for Microarray Sample Classification,” IEEE Transactions on
Knowledge and Data Engineering, vol. 24, no. 1, pp. 127 - 140, 2012.
[19] A. Sharma, I. Seiya, and M. Satoru, “A Top-R Feature Selection
Algorithm For Microarray Gene Expression Data,” IEEE/ACM
Transactions on Computational Biology and Bioinformatics, vol. 9, no.
3, pp. 754 – 764, 2012.
[20] K. Yendrapalli, R. Basnet, S. Mukkamala, and A.H. Sung, “Gene
Selection for Tumor Classification Using Microarray Gene Expression
Data,” in Proceedings of the World Congress on Engineering, vol. I,
2007.
[21] X. Momiao, W. Li, J. Zhao, J. Li, and B. Eric, “Feature (Gene) Selection
in Gene Expression-Based Tumor Classification,” Journal of Molecular
Genetics and Metabolism, vol. 73, pp. 239–247, 2001.
[22] R.C. Eberhart, Y. Shi, “Comparison between Genetic Algorithms and
Particle Swarm Optimization, Evolutionary Programming VII,” Lecture
Notes in Computer Science, Springer New York, vol. 1447, pp. 611-616,
1998.
[23] M. Eusuff, K. Lansey, “Optimization of Water Distribution Network
Design Using Shuffled Frog Leaping Algorithm,” Journal of Water
Resources Planning and Management vol. 129, no. 3, pp 210 – 225,
2003.
[24] R.O. Duda, P.E. Hart, “Pattern Classification and Scene Analysis,” New
York: John Wiley and Sons, 1973.
[25] N.S. Altman, “An introduction to kernel and nearest-neighbor
nonparametric regression,” The American Statistician, vol. 46, no. 3, pp.
175-185, 1992.
[26] M.S. Mohamed, D. Safaai, and R.O. Muhammad, “Genetic Algorithms
wrapper approach to select informative genes for gene expression
microarray classification using support vector machines,” in InCoB'04:
Proceedings of Third International Conference on Bioinformatics,
Auckland, New Zealand, 2004.