An Integrated Predictor for Cis-Regulatory Modules

Various cis-regulatory module (CRM) predictors have been proposed in the last decade. Several well-established CRM predictors adopted different categories of prediction strategies, including window clustering, probabilistic modeling and phylogenetic footprinting. Appropriate integration of them has a potential to achieve high quality CRM prediction. This study analyzed four existing CRM predictors (ClusterBuster, MSCAN, CisModule and MultiModule) to seek a predictor combination that delivers a higher accuracy than individual CRM predictors. 465 CRMs across 140 Drosophila melanogaster genes from the RED fly database were used to evaluate the integrated CRM predictor proposed in this study. The results show that four predictor combinations achieved superior performance than the best individual CRM predictor.





References:
[1] Su, J., S. A. Teichmann, and T.A. Down, Assessing computational
methods of cis-regulatory module prediction. PLoS Computational
Biology, 2010. 6(12): p. e1001020.
[2] Davidson, E. H., The regulatory genome: gene regulatory networks in
development and evolution. 2010: Academic Press.
[3] Kazemian, M., M. H. Brodsky, and S. Sinha, Genome Surveyor 2.0:
cis-regulatory analysis in Drosophila. Nucleic Acids Res, 2011. 39(Web
Server issue): p. W79-85.
[4] Levine, M. and E. H. Davidson, Gene regulatory networks for
development. Proc Natl Acad Sci U S A, 2005. 102(14): p. 4936-42.
[5] Johansson, O., et al., Identification of functional clusters of transcription
factor binding motifs in genome sequences: the MSCAN algorithm.
Bioinformatics, 2003. 19(Suppl 1): p. i169-i176.
[6] Zhou, Q. and W. H. Wong, CisModule: de novo discovery of
cis-regulatory modules by hierarchical mixture modeling. Proc Natl Acad
Sci U S A, 2004. 101(33): p. 12114-9.
[7] Frith, M.C., Cluster-Buster: finding dense clusters of motifs in DNA
sequences. Nucleic Acids Research, 2003. 31(13): p. 3666-3668.
[8] Zhou, Q. and W. H. Wong, Coupling hidden Markov models for the
discovery of Cis -regulatory modules in multiple species. The Annals of
Applied Statistics, 2007. 1(1): p. 36-65.
[9] Baum, L. E. and T. Petrie, Statistical inference for probabilistic functions
of finite state Markov chains. The annals of mathematical statistics, 1966.
37(6): p. 1554-1563.
[10] Gallo, S. M., et al., REDfly v3.0: toward a comprehensive database of
transcriptional regulatory elements in Drosophila. Nucleic Acids Res,
2011. 39(Database issue): p. D118-23.
[11] Gallo, S. M., et al., REDfly: a Regulatory Element Database for
Drosophila. Bioinformatics, 2006. 22(3): p. 381-3.
[12] Drysdale, R. A. and M. A. Crosby, FlyBase: genes and gene models.
Nucleic Acids Res, 2005. 33(Database issue): p. D390-5.
[13] Marygold, S. J., et al., FlyBase: improvements to the bibliography.
Nucleic Acids Res, 2013. 41(Database issue): p. D751-7.
[14] Karolchik, D., The UCSC Genome Browser Database. Nucleic Acids
Research, 2003. 31(1): p. 51-54.
[15] Fujita, P. A., et al., The UCSC Genome Browser database: update 2011.
Nucleic Acids Res, 2011. 39(Database issue): p. D876-82.
[16] Kulakovskiy, I. V. and V.J. Makeev, Discovery of DNA motifs
recognized by transcription factors through integration of different
experimental sources. Biophysics, 2010. 54(6): p. 667-674.
[17] Portales-Casamar, E., et al., JASPAR 2010: the greatly expanded
open-access database of transcription factor binding profiles. Nucleic
Acids Res, 2010. 38(Database issue): p. D105-10.
[18] Witten, I. H. and E. Frank, Data mining : practical machine learning tools
and techniques. 2nd ed. Morgan Kaufmann series in data management
systems. 2005, Amsterdam ; Boston, MA: Morgan Kaufman. xxxi, 525.