BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis

Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to the value given to its input parameters and the discretization procedure used in the preprocessing step, also when noise is present, classical association rules miners discover multiple small fragments of the true bicluster, but miss the true bicluster itself. This paper formally presents a generalized noise tolerant bicluster model, termed as μBicluster. An iterative algorithm termed as BIDENS based on the proposed model is introduced that can discover a set of k possibly overlapping biclusters simultaneously. Our model uses a more flexible method to partition the dimensions to preserve meaningful and significant biclusters. The proposed algorithm allows discovering biclusters that hard to be discovered by BIMODULE. Experimental study on yeast, human gene expression data and several artificial datasets shows that our algorithm offers substantial improvements over several previously proposed biclustering algorithms.




References:
[1] G. Getz, E. Levine, and E. Domany, "Coupled Two-Way Clustering
Analysis of Gene Microarray Data," Proc. Natural Academy of Sciences
US, pp. 12079-12084, 2000.
[2] C.Tang, L.Zhang, I.Zhang, and M.Ramanathan, "Interrelated Two-Way
Clustering: An Unsupervised Approach for Gene Expression Data
Analysis," Proc. Second IEEE Int-l Symp. Bioinformatics and Bioeng.,
pp. 41-48, 2001.
[3] Y. Cheng and G. Church, "Biclustering of expression data," Proc. Eighth
Int-l Conf. Intelligent Systems for Molecular Biology(ISMB -00), pp.
93-103, 2000.
[4] J. Yang, W. Wang, H. Wang, and P. Yu, "Enhanced Biclustering on
Expression Data," Proc. Third IEEE Conf. Bioinformatics and
Bioeng.,pp. 321-327, 2003.
[5] T.M. Murali and S. Kasif, "Extracting Conserved Gene Expression
Motifs from Gene Expression Data," Proc. Pacific Symp.
Biocomputing,vol. 8, pp. 77-88, 2003.
[6] L. Lazzeroni and A. Owen, "Plaid Models for Gene Expression Data,"
technical report, Stanford Univ., 2000.
[7] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, "Discovering Local
Structure in Gene Expression Data: The Order-Preserving Submatrix
Problem," Proc. Sixth Int-l Conf. Computational Biology (RECOMB
-02), pp. 49-57, 2002.
[8] J. Ihmels, S. Bergmann, and N. Brkai, "Defining Transaction Modules
using large scale gene expression data,"
Bioinformatics,Vol.20,No.13,pp.1993-2003, 2004.
[9] A. Tanay, R. Sharan, and R. Shamir, "Discovering Statistically
Significant Biclusters in Gene Expression Data," Bioinformatics, vol.
18, pp. S136-S144, 2002.
[10] A. Prelic, S. Bleuler, P. Zimmermann, A.Wille, P. Buhlmann, W.
Gruissem, L. Hennig, L. Thiele, and E.Zitzler, "A Systematic
comparison and evaluation of biclustering methods for gene expression
data," Bioinformatics, 22:1122-1129, 2006.
[11] H. Sharara M.A.Ismail, "╬▒CORR: A novel algorithm for clustering gene
expression data," Bioinformatics and Bioengineering, 2007. BIBE 2007.
Proceedings of the 7th IEEE International Conference, pp. 974-981,
2007.
[12] J. Liu and W. Wang, "OP-Cluster: Clustering by Tendency in High
Dimensional Space," Proc. Third IEEE Int-l Conf. Data Mining, pp. 187-
194, 2003.
[13] LCM ver2 Available http://research.nii.ac.jp/~uno/codes-j.html.
[14] G. Liu,Jinyan, L. Kelvin and L. Wong, "Distance Based Subspace
Clustering with Flexible Dimension Partitioning," IEEE, pp. 1250-1254,
2007.
[15] J. Pei, A. K. Tung, and J. Han., "Fault-tolerant frequent pattern mining:
Problems and challenges,"Workshop on Research Issues in Data Mining
and Knowledge Discovery, 2001.
[16] M. P. Wand, "Data-Based Choice of Histogram Bin Width," The
American Statistician, vol. 51, 1996, pp. 59-64.
[17] Sara C. Madeira and Arlindo L. Oliveira, "Biclustering Algorithms for
Biological Data Analysis: A Survey," IEEE TRANS. Computational
Biology And Bioinformatics, vol. 1, 2004.
[18] Yeast and Human Dataset. Available
http://arep.med.harvard.edu/network discovery.
[19] SyntheticDatasets. Available
http://www.tik.ee.ethz.ch/sop/bimax/SupplementMatrials,Biclustering.ht
ml.
[20] Y. Okada, W. Fujibuchi and P. Horton, "Module Discovery in Gene
Expression Data Using Closed Itemset Mining Algorithm," IPSG
transactions in bioinformatics, vol.48, pp39-48, 2007.
[21] A. B. Tchagang and A. H. Tewfik, "DNAMicroarray Data Analysis: A
Novel Biclustering Algorithm Approach," EURASIP Journal on Applied
Signal Processing, vol. 2006, pp. 1-12.