A Comparison of SVM-based Criteria in Evolutionary Method for Gene Selection and Classification of Microarray Data

An evolutionary method whose selection and recombination operations are based on generalization error-bounds of support vector machine (SVM) can select a subset of potentially informative genes for SVM classifier very efficiently [7]. In this paper, we will use the derivative of error-bound (first-order criteria) to select and recombine gene features in the evolutionary process, and compare the performance of the derivative of error-bound with the error-bound itself (zero-order) in the evolutionary process. We also investigate several error-bounds and their derivatives to compare the performance, and find the best criteria for gene selection and classification. We use 7 cancer-related human gene expression datasets to evaluate the performance of the zero-order and first-order criteria of error-bounds. Though both criteria have the same strategy in theoretically, experimental results demonstrate the best criterion for microarray gene expression data.




References:
[1] T. Jirapech-Umpai and S. Aitken, "Feature selection and classification
for microarray data analysis: Evolutionary methods for identifying
predictive genes", BMC Bioinformatics, vol. 6, no. 148, 2005.
[2] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, "Gene selection
for cancer classification using support vector machines", Machine
Learning, vol. 46, pp. 389-422, 2002.
[3] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio and V.
Vapnik, "Feature selection for svms", Advanced in Neural Information
Processing Systems 13, 2001.
[4] H.-L. Huang and F. -L. Chang, "ESVM: Evolutionary support vector
machine for automatic feature selection and classification of microarray
data", Bio Systems, vol. 90, pp. 516-528, 2007.
[5] A. Rakotomamonjy, "Variable selection using SVM-based criteria",
Journal of Machine Learning Research, vol. 3, pp. 1357-1370, 2003.
[6] A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin and S. Levy,
"A comprehensive evaluation of multicategory classification methods
for microarray gene expression cancer diagnosis", Bioinformatics, vol.
21, no. 5, pp. 631-643, 2005.
[7] R. Debnath and T. Kurita, "An evolutionary approach for gene selection
and classification of microarray data based on SVM error-bound
theories", BioSyatems, vol. 100, issue 1, pp. 39-46, 2010.
[8] M. Opper and O. Winther, "Gaussian process and SVM: Mean field
and leave-one-out", Smola, A., Bartlett, P., Sch¨olkopf, B., Schuurmans,
D. (Eds.), Advances in large margin classifiers, Cambridge, MA:MIT
Press, pp. 311-326, 2000.
[9] T.S. Jaakkola and D. Haussler, "Probabilistic kernel regression models",
in Proc. 1999 Conference on AI and Statistics, Floria, USA, 1999.
[10] X. Zhou, and D. P. Tuck, "Gene selection using a new error bound
for support vector machines", in Proc. Eleventh Annual International
Conference on Research in Computational Molecular Biology, San
Francisco, USA, 2007.
[11] O. Chapelle, V. Vapnik, O. Bousquet, S. Mukherjee, "Choosing multiple
parameters for support vector machines", Machine Learning, vol.
46, pp. 131-159, 2002.
[12] V. Vapnik, Statistical Learning Theory, New York:Wiley, 1998.