Feature Subset Selection approach based on Maximizing Margin of Support Vector Classifier

Identification of cancer genes that might anticipate the clinical behaviors from different types of cancer disease is challenging due to the huge number of genes and small number of patients samples. The new method is being proposed based on supervised learning of classification like support vector machines (SVMs).A new solution is described by the introduction of the Maximized Margin (MM) in the subset criterion, which permits to get near the least generalization error rate. In class prediction problem, gene selection is essential to improve the accuracy and to identify genes for cancer disease. The performance of the new method was evaluated with real-world data experiment. It can give the better accuracy for classification.




References:
[1] A.Jain and D.Zongker, "Feature Selection: Evaluation, application and
small performance", IEEE Transaction on Pattern Analysis and Machine
Intelligence,vol.19,no.2pp.153-158,1997.
[2] C.Emmanouilidis, A.Hunter,and J.MacIntyre,"A multiobjective
evolutionary setting for feature selection and a commonality-based
crossover operator", in Proceedings of the 2000 Congress on
Evolutionary Computation(CEC00).
[3] Cancer Program Data Set [http://www.broad.mit.edu/cgibin/
cancer/datasets.cgi]
[4] Eisen, M.B and brown, P.O.(1999); DNA arrays for analysis of gene
expression. Methods Enzymbol, 303: 179-205
[5] Furey TS, Cristianini N, Duffy N, Bednarski DW,Schummer M,
Haussler D. (2000). Support vector machine classification and validation
of cancer tissue samples using microarray expression data.
Bioinformatics 16:906-914.
[6] Kim,H.D.and Cho,S.-B(2000):Genetic optimization of structure-adaptive
self-oranization map for efficient classification. Proc. of International
Conference on Soft Computing,34-39,World-Scientific Publishing.
[7] K.M. Win and Kham N.S.M, "Minimizing Essential Set Based Feature
selection for Cancer Classification", ICCA2008, Yangon, Myanmar, Feb
14-15, 2008
[8] M.PBrown,W.N.Grundy,D.Lin,N.Cristianini, C.W. Sugnet,J.
Ares,Manuel, and D.Haussel"Support Vector machine classification of
microarray gene expression data",Universith of California,Santa
Cruz,Tech.June 1999.
[9] P.Larranga and J.Lozano. Estimation of distribution Algorithm:A new
Tool for Evolutionary Optimization. Kluwer Academic Publishers,
Boston, USA, 2001
[10] R.Kohavi and G.H.John,"Wrappers for feature subset selection"
Artificial Intelligent,vol.97,no.1-2,pp.273-324,1997.
[11] R.Gilad-Bachrah,A.Navot,N.Tishby,"Margin based feature selection
theory and algorithms"in proceeding of the 21st International Conference
on Machine Learning(ICML04).New York :ACM Press,2004.
[12] Shamir, R. and Shanran,R (2001):Algorithmic approaches to clustering
gene expression data.Current Topic in Computational Biology.In
Jiang,T.,Smith,T.,Xu,Y.and Zhang .M.Q.(eds),MIT press
[13] T.Marill and D.Green,"On the effectiveness of reporters in recognition
systems",IEEE Transactions on Information Theory,vol.9,pp.11-17,1999
[14] T.Paul and H.Iba.Selection of the most useful subset of genes for gene
expression -based classification. In Proceeding of the 2004 Congress
onEvolutionary Computation (CEC 2004),pages 2076-
2083,Portland,Oregon,USA,2004.
[15] V.N Vapnik. The Nature of Statistical Learning Theory .Springer, New
York, 1995.
[16] Xu J, Zhang X, Li Y: Kernel MSE algorithm: A unified framework for
KFD, LS-SVM and KRR. In Proceedings of the International Joint
Conference on Neural Networks: 15-19 July 2001 Washington, DC,
IEEE;2001:1486-1491.
[17] Zhu J, Hastie T: Classification of gene microarrays by penalized logistic
regression. Biostatistics 2004, 5:427-443.