Imputation Technique for Feature Selection in Microarray Data Set

Analyzing DNA microarray data sets is a great
challenge, which faces the bioinformaticians due to the complication
of using statistical and machine learning techniques. The challenge
will be doubled if the microarray data sets contain missing data,
which happens regularly because these techniques cannot deal with
missing data. One of the most important data analysis process on
the microarray data set is feature selection. This process finds the
most important genes that affect certain disease. In this paper, we
introduce a technique for imputing the missing data in microarray
data sets while performing feature selection.





References:
[1] Ash A Alizadeh, Michael B Eisen, R Eric Davis, Chi Ma, Izidore S
Lossos, Andreas Rosenwald, Jennifer C Boldrick, Hajeer Sabet, Truc
Tran, Xin Yu, et al. Distinct types of diffuse large b-cell lymphoma
identified by gene expression profiling. Nature, 403(6769):503–511,
2000.
[2] V Bol´on-Canedo, N S´anchez-Maro˜no, A Alonso-Betanzos, JM Ben´ıtez,
and F Herrera. A review of microarray datasets and applied feature
selection methods. Information Sciences, 282:111–135, 2014.
[3] L´ıgia P Br´as and Jos´e C Menezes. Improving cluster-based missing
value estimation of dna microarray data. Biomolecular engineering,
24(2):273–282, 2007.
[4] Magalie Celton, Alain Malpertuy, Ga¨elle Lelandais, and Alexandre G
De Brevern. Comparative analysis of missing value imputation methods
to improve clustering and interpretation of microarray experiments. BMC
genomics, 11(1):15, 2010.
[5] Kyriacos Chrysostomou, M Lee, SY Chen, and X Liu. Wrapper feature
selection., 2009.
[6] Alexandre G De Brevern, Serge Hazout, and Alain Malpertuy. Influence
of microarrays experiments missing values on the stability of gene
groups by hierarchical clustering. BMC bioinformatics, 5(1):114, 2004.
[7] Chris Ding and Hanchuan Peng. Minimum redundancy feature selection
from microarray gene expression data. Journal of bioinformatics and
computational biology, 3(02):185–205, 2005.
[8] Eibe Frank, Mark Hall, Len Trigg, Geoffrey Holmes, and Ian H
Witten. Data mining in bioinformatics using weka. Bioinformatics,
20(15):2479–2481, 2004.
[9] Rebecka J¨ornsten, Hui-Yu Wang, William J Welsh, and Ming Ouyang.
Dna microarray data imputation and significance analysis of differential
expression. Bioinformatics, 21(22):4155–4161, 2005.
[10] Hyunsoo Kim, Gene H Golub, and Haesun Park. Missing value
estimation for dna microarray gene expression data: local least squares
imputation. Bioinformatics, 21(2):187–198, 2005.
[11] Ki-Yeol Kim, Byoung-Jin Kim, and Gwan-Su Yi. Reuse of imputed
data in microarray analysis increases imputation efficiency. BMC
bioinformatics, 5(1):160, 2004.
[12] Alan Wee-Chung Liew, Ngai-Fong Law, and Hong Yan. Missing value
imputation for gene expression data: computational techniques to recover
missing data from available information. Briefings in bioinformatics,
12(5):498–513, 2011.
[13] Rosa J Meijer and Jelle J Goeman. Efficient approximate k-fold and
leave-one-out cross-validation for ridge regression. Biometrical Journal,
55(2):141–155, 2013.
[14] Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining.
Introduction to linear regression analysis, volume 821. John Wiley &
Sons, 2012.
[15] Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden,
Ken-ichi Matsubara, and Shin Ishii. A bayesian missing value
estimation method for gene expression profile data. Bioinformatics,
19(16):2088–2096, 2003.
[16] Yvan Saeys, I˜naki Inza, and Pedro Larra˜naga. A review of
feature selection techniques in bioinformatics. bioinformatics,
23(19):2507–2517, 2007.
[17] Henning Schmidt and Mats Jirstrand. Systems biology toolbox for
matlab: a computational platform for research in systems biology.
Bioinformatics, 22(4):514–515, 2006.
[18] Muhammad Shoaib B Sehgal, Iqbal Gondal, and Laurence Dooley.
Statistical neural networks and support vector machine for the
classification of genetic mutations in ovarian cancer. In Computational
Intelligence in Bioinformatics and Computational Biology, 2004.
CIBCB’04. Proceedings of the 2004 IEEE Symposium on, pages
140–146. IEEE, 2004.
[19] Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown,
Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B Altman.
Missing value estimation methods for dna microarrays. Bioinformatics,
17(6):520–525, 2001.
[20] Teruyuki Ueda, Masao Honda, Katsuhisa Horimoto, Sachiyo Aburatani,
Shigeru Saito, Taro Yamashita, Yoshio Sakai, Mikiko Nakamura,
Hajime Takatori, Hajime Sunagozaka, et al. Gene expression profiling
of hepatitis b-and hepatitis c-related hepatocellular carcinoma using
graphical gaussian modeling. Genomics, 101(4):238–248, 2013.
[21] Xiaobai Zhang, Xiaofeng Song, Huinan Wang, and Huanping Zhang.
Sequential local least squares imputation estimating missing value of
microarray data. Computers in biology and medicine, 38(10):1112–1120,
2008.