BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis
Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
[1] G. Getz, E. Levine, and E. Domany, "Coupled Two-Way Clustering
Analysis of Gene Microarray Data," Proc. Natural Academy of Sciences
US, pp. 12079-12084, 2000.
[2] C.Tang, L.Zhang, I.Zhang, and M.Ramanathan, "Interrelated Two-Way
Clustering: An Unsupervised Approach for Gene Expression Data
Analysis," Proc. Second IEEE Int-l Symp. Bioinformatics and Bioeng.,
pp. 41-48, 2001.
[3] Y. Cheng and G. Church, "Biclustering of expression data," Proc. Eighth
Int-l Conf. Intelligent Systems for Molecular Biology(ISMB -00), pp.
93-103, 2000.
[4] J. Yang, W. Wang, H. Wang, and P. Yu, "Enhanced Biclustering on
Expression Data," Proc. Third IEEE Conf. Bioinformatics and
Bioeng.,pp. 321-327, 2003.
[5] T.M. Murali and S. Kasif, "Extracting Conserved Gene Expression
Motifs from Gene Expression Data," Proc. Pacific Symp.
Biocomputing,vol. 8, pp. 77-88, 2003.
[6] L. Lazzeroni and A. Owen, "Plaid Models for Gene Expression Data,"
technical report, Stanford Univ., 2000.
[7] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, "Discovering Local
Structure in Gene Expression Data: The Order-Preserving Submatrix
Problem," Proc. Sixth Int-l Conf. Computational Biology (RECOMB
-02), pp. 49-57, 2002.
[8] J. Ihmels, S. Bergmann, and N. Brkai, "Defining Transaction Modules
using large scale gene expression data,"
Bioinformatics,Vol.20,No.13,pp.1993-2003, 2004.
[9] A. Tanay, R. Sharan, and R. Shamir, "Discovering Statistically
Significant Biclusters in Gene Expression Data," Bioinformatics, vol.
18, pp. S136-S144, 2002.
[10] A. Prelic, S. Bleuler, P. Zimmermann, A.Wille, P. Buhlmann, W.
Gruissem, L. Hennig, L. Thiele, and E.Zitzler, "A Systematic
comparison and evaluation of biclustering methods for gene expression
data," Bioinformatics, 22:1122-1129, 2006.
[11] H. Sharara M.A.Ismail, "╬▒CORR: A novel algorithm for clustering gene
expression data," Bioinformatics and Bioengineering, 2007. BIBE 2007.
Proceedings of the 7th IEEE International Conference, pp. 974-981,
2007.
[12] J. Liu and W. Wang, "OP-Cluster: Clustering by Tendency in High
Dimensional Space," Proc. Third IEEE Int-l Conf. Data Mining, pp. 187-
194, 2003.
[13] LCM ver2 Available http://research.nii.ac.jp/~uno/codes-j.html.
[14] G. Liu,Jinyan, L. Kelvin and L. Wong, "Distance Based Subspace
Clustering with Flexible Dimension Partitioning," IEEE, pp. 1250-1254,
2007.
[15] J. Pei, A. K. Tung, and J. Han., "Fault-tolerant frequent pattern mining:
Problems and challenges,"Workshop on Research Issues in Data Mining
and Knowledge Discovery, 2001.
[16] M. P. Wand, "Data-Based Choice of Histogram Bin Width," The
American Statistician, vol. 51, 1996, pp. 59-64.
[17] Sara C. Madeira and Arlindo L. Oliveira, "Biclustering Algorithms for
Biological Data Analysis: A Survey," IEEE TRANS. Computational
Biology And Bioinformatics, vol. 1, 2004.
[18] Yeast and Human Dataset. Available
http://arep.med.harvard.edu/network discovery.
[19] SyntheticDatasets. Available
http://www.tik.ee.ethz.ch/sop/bimax/SupplementMatrials,Biclustering.ht
ml.
[20] Y. Okada, W. Fujibuchi and P. Horton, "Module Discovery in Gene
Expression Data Using Closed Itemset Mining Algorithm," IPSG
transactions in bioinformatics, vol.48, pp39-48, 2007.
[21] A. B. Tchagang and A. H. Tewfik, "DNAMicroarray Data Analysis: A
Novel Biclustering Algorithm Approach," EURASIP Journal on Applied
Signal Processing, vol. 2006, pp. 1-12.
[1] G. Getz, E. Levine, and E. Domany, "Coupled Two-Way Clustering
Analysis of Gene Microarray Data," Proc. Natural Academy of Sciences
US, pp. 12079-12084, 2000.
[2] C.Tang, L.Zhang, I.Zhang, and M.Ramanathan, "Interrelated Two-Way
Clustering: An Unsupervised Approach for Gene Expression Data
Analysis," Proc. Second IEEE Int-l Symp. Bioinformatics and Bioeng.,
pp. 41-48, 2001.
[3] Y. Cheng and G. Church, "Biclustering of expression data," Proc. Eighth
Int-l Conf. Intelligent Systems for Molecular Biology(ISMB -00), pp.
93-103, 2000.
[4] J. Yang, W. Wang, H. Wang, and P. Yu, "Enhanced Biclustering on
Expression Data," Proc. Third IEEE Conf. Bioinformatics and
Bioeng.,pp. 321-327, 2003.
[5] T.M. Murali and S. Kasif, "Extracting Conserved Gene Expression
Motifs from Gene Expression Data," Proc. Pacific Symp.
Biocomputing,vol. 8, pp. 77-88, 2003.
[6] L. Lazzeroni and A. Owen, "Plaid Models for Gene Expression Data,"
technical report, Stanford Univ., 2000.
[7] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, "Discovering Local
Structure in Gene Expression Data: The Order-Preserving Submatrix
Problem," Proc. Sixth Int-l Conf. Computational Biology (RECOMB
-02), pp. 49-57, 2002.
[8] J. Ihmels, S. Bergmann, and N. Brkai, "Defining Transaction Modules
using large scale gene expression data,"
Bioinformatics,Vol.20,No.13,pp.1993-2003, 2004.
[9] A. Tanay, R. Sharan, and R. Shamir, "Discovering Statistically
Significant Biclusters in Gene Expression Data," Bioinformatics, vol.
18, pp. S136-S144, 2002.
[10] A. Prelic, S. Bleuler, P. Zimmermann, A.Wille, P. Buhlmann, W.
Gruissem, L. Hennig, L. Thiele, and E.Zitzler, "A Systematic
comparison and evaluation of biclustering methods for gene expression
data," Bioinformatics, 22:1122-1129, 2006.
[11] H. Sharara M.A.Ismail, "╬▒CORR: A novel algorithm for clustering gene
expression data," Bioinformatics and Bioengineering, 2007. BIBE 2007.
Proceedings of the 7th IEEE International Conference, pp. 974-981,
2007.
[12] J. Liu and W. Wang, "OP-Cluster: Clustering by Tendency in High
Dimensional Space," Proc. Third IEEE Int-l Conf. Data Mining, pp. 187-
194, 2003.
[13] LCM ver2 Available http://research.nii.ac.jp/~uno/codes-j.html.
[14] G. Liu,Jinyan, L. Kelvin and L. Wong, "Distance Based Subspace
Clustering with Flexible Dimension Partitioning," IEEE, pp. 1250-1254,
2007.
[15] J. Pei, A. K. Tung, and J. Han., "Fault-tolerant frequent pattern mining:
Problems and challenges,"Workshop on Research Issues in Data Mining
and Knowledge Discovery, 2001.
[16] M. P. Wand, "Data-Based Choice of Histogram Bin Width," The
American Statistician, vol. 51, 1996, pp. 59-64.
[17] Sara C. Madeira and Arlindo L. Oliveira, "Biclustering Algorithms for
Biological Data Analysis: A Survey," IEEE TRANS. Computational
Biology And Bioinformatics, vol. 1, 2004.
[18] Yeast and Human Dataset. Available
http://arep.med.harvard.edu/network discovery.
[19] SyntheticDatasets. Available
http://www.tik.ee.ethz.ch/sop/bimax/SupplementMatrials,Biclustering.ht
ml.
[20] Y. Okada, W. Fujibuchi and P. Horton, "Module Discovery in Gene
Expression Data Using Closed Itemset Mining Algorithm," IPSG
transactions in bioinformatics, vol.48, pp39-48, 2007.
[21] A. B. Tchagang and A. H. Tewfik, "DNAMicroarray Data Analysis: A
Novel Biclustering Algorithm Approach," EURASIP Journal on Applied
Signal Processing, vol. 2006, pp. 1-12.
@article{"International Journal of Information, Control and Computer Sciences:62618", author = "Mohamed A. Mahfouz and M. A. Ismail", title = "BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis", abstract = "Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.", keywords = "Machine learning, biclustering, bi-dimensional
clustering, gene expression analysis, data mining.", volume = "3", number = "1", pages = "186-7", }