Gene Selection Guided by Feature Interdependence

Cancers could normally be marked by a number of
differentially expressed genes which show enormous potential as
biomarkers for a certain disease. Recent years, cancer classification
based on the investigation of gene expression profiles derived by
high-throughput microarrays has widely been used. The selection of
discriminative genes is, therefore, an essential preprocess step in
carcinogenesis studies. In this paper, we have proposed a novel gene
selector using information-theoretic measures for biological
discovery. This multivariate filter is a four-stage framework through
the analyses of feature relevance, feature interdependence, feature
redundancy-dependence and subset rankings, and having been
examined on the colon cancer data set. Our experimental result show
that the proposed method outperformed other information theorem
based filters in all aspect of classification errors and classification
performance.





References:
<p>[1] J. R. Nevins, and A. Potti, &ldquo;Mining gene expression profiles: expression
signatures as cancer phenotypes,&rdquo; Nature Reviews Genetics, vol. 8, no. 8,
pp. 601-609, Aug, 2007.
[2] S. Y. Kim, &ldquo;Effects of sample size on robustness and prediction accuracy
of a prognostic gene signature,&rdquo; BMC Bioinformatics, vol. 10, pp. 147,
May, 2009.
[3] Y. Saeys, I. Inza, and P. Larranaga, &ldquo;A review of feature selection
techniques in bioinformatics,&rdquo; Bioinformatics, vol. 23, no. 19, pp.
2507-2517, Oct, 2007.
[4] D. A. Bell, and H. Wang, &ldquo;A formalism for relevance and its application
in feature subset selection,&rdquo; Machine Learning, vol. 41, no. 2, pp.
175-195, Nov, 2004.
[5] L. Ein-Dor, O. Zuk, and E. Domany, &ldquo;Thousands of samples are needed
to generate a robust gene list for predicting outcome in cancer,&rdquo;
Proceedings National Academy Sciences, vol. 103, no. 15, pp. 5923-5928,
Apr, 2006.
[6] S. Davies, and S. Russell, &ldquo;NP-completeness of searches for smallest
possible feature sets,&rdquo; AAAI Symposium on Intelligent Relevance, pp.
37-39, 1994.
[7] C. Lazar, J. Taminau, S. Megancket al., &ldquo;A survey on filter techniques for
feature selection in gene expression microarray analysis,&rdquo; IEEE/ACM
Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4,
pp. 1106-1119, Jul-Aug, 2012.
[8] A. Albrechta, S. A. Vinterbob, and L. Ohno-Machado, &ldquo;An Epicurean
learning approach to gene-expression data classification,&rdquo; Artificial
Intelligence in Medicine, vol. 28, no. 1, pp. 75-87, May, 2003.
[9] I. A. Gheyas, and L. S. Smith, &ldquo;Feature subset selection in large
dimensionality domains,&rdquo; Pattern Recognition, vol. 43, no. 1, pp. 5-13,
Jan, 2010.
[10] I. Guyon, J. Weston, S. Barnhill et al., &ldquo;Gene selection for cancer
classification using support vector machines,&rdquo; Machine Learning, vol. 46,
no. 1, pp. 389-422, 2002.
[11] X. Zhou, and D. P. Tuck, &ldquo;MSVM-RFE:extensions of SVM-RFE for
multiclass gene selection on DNA microarray data,&rdquo; Bioinformatics, vol.
23, no. 9, pp. 1106-1114, May, 2007.
[12] P. A. Mundra, and J. C. Rajapakse, &ldquo;SVM-RFE with MRMR filter for
gene selection,&rdquo; IEEE Trans Nanobioscience, vol. 9, no. 1, pp. 31-37, Mar,
2010.
[13] C. Ding, and H. Peng, &ldquo;Minimum redundancy feature selection from
microarray gene expression data,&rdquo; Journal of Bioinformatics and
Computational Biology, vol. 3, no. 2, pp. 185-205, Apr, 2005.
[14] F. Fleuret, &ldquo;Fast binary feature selection with conditional
mutualinformation,&rdquo; Journal of Machine Learning Research, vol. 5, pp.
1531-1555, Nov, 2004.
[15] L. Yu, and H. Liu, &ldquo;Efficient feature selection via analysis of relevance
and redundancy,&rdquo; Journal of Machine Learning Research, vol. 5, pp.
1205-1224, Oct, 2004.
[16] T. M. Cover, and J. A. Thomas, Elements of Information Theory, 2nd ed.,
Hoboken, NJ: John Wiley &amp; Sons, ch. 2, pp. 13-55, 2006.
[17] R. Kohavi, and G. H. John, &ldquo;Wrappers for feature subset selection,&rdquo;
Artificial Intelligence, vol. 97, no. 1-2, pp. 273-324, Dec, 1997.
[18] U. Alon, N. Barkai, D. A. Nottermanet al., &ldquo;Broad patterns of gene
expression revealed by clustering analysis of tumor and normal colon
tissues probed by oligonucleotide arrays,&rdquo; Proceedings National
Academy Sciences, vol. 96, no. 12, pp. 6745-6750, Jun, 1999.
[19] G. Brown, A. Pocock, M.-J. Zhao et al., &ldquo;Conditional Likelihood
Maximisation: A Unifying Framework for Information Theoretic Feature
Selection,&rdquo; Journal of Machine Learning Research, vol. 13, pp. 27-66,
Jan, 2012.
[20] G. Ghilardi, M. L. Biondi, M. Erarioet al., &ldquo;Colorectal carcinoma
susceptibility and metastases are associated with matrix
metalloproteinase-7 promoter polymorphisms.,&rdquo; Clinical Chemistry, vol.
49, no. 11, pp. 1940-1942, Nov, 2003.
[21] B. Yang, K. Su, J. Gaoet al., &ldquo;Expression and prognostic value of matrix
metalloproteinase-7 in colorectal cancer,&rdquo; Asian Pacific Journal of
Cancer Prevention, vol. 13, no. 3, pp. 1049-1052, 2012.
[22] M. Egeblad, and Z. Werb, &ldquo;New functions for the matrix
metalloproteinases in cancer progression,&rdquo; Nature Reviews Cancer, vol. 2,
no. 3, pp. 161-174, Mar, 2002.
[23] Y. Ma, P. Zhang, F. Wang et al., &ldquo;Searching for consistently reported upand
down-regulated biomarkers in colorectal cancer: a systematic review
of proteomic studies,&rdquo; Molecular Biology Reports, vol. 39, no. 8, pp.
8483-8490, Aug, 2012.</p>