Full-genomic Network Inference for Non-model organisms: A Case Study for the Fungal Pathogen Candida albicans

Reverse engineering of full-genomic interaction networks based on compendia of expression data has been successfully applied for a number of model organisms. This study adapts these approaches for an important non-model organism: The major human fungal pathogen Candida albicans. During the infection process, the pathogen can adapt to a wide range of environmental niches and reversibly changes its growth form. Given the importance of these processes, it is important to know how they are regulated. This study presents a reverse engineering strategy able to infer fullgenomic interaction networks for C. albicans based on a linear regression, utilizing the sparseness criterion (LASSO). To overcome the limited amount of expression data and small number of known interactions, we utilize different prior-knowledge sources guiding the network inference to a knowledge driven solution. Since, no database of known interactions for C. albicans exists, we use a textmining system which utilizes full-text research papers to identify known regulatory interactions. By comparing with these known regulatory interactions, we find an optimal value for global modelling parameters weighting the influence of the sparseness criterion and the prior-knowledge. Furthermore, we show that soft integration of prior-knowledge additionally improves the performance. Finally, we compare the performance of our approach to state of the art network inference approaches.





References:
[1] J. J. Faith et al., "Large-scale mapping and validation of escherichia
coli transcriptional regulation from a compendium of expression
profiles." PLoS Biol, vol. 5, no. 1, p. e8, Jan 2007. (Online). Available:
http://dx.doi.org/10.1371/journal.pbio.0050008
[2] R. Opgen-Rhein and K. Strimmer, "From correlation to causation
networks: a simple approximate learning algorithm and its application
to high-dimensional plant gene expression data." BMC Syst Biol,
vol. 1, p. 37, 2007. (Online). Available: http://dx.doi.org/10.1186/
1752-0509-1-37
[3] A. A. Margolin et al., "Aracne: an algorithm for the reconstruction
of gene regulatory networks in a mammalian cellular context." BMC
Bioinformatics, vol. 7 Suppl 1, p. S7, 2006. (Online). Available:
http://dx.doi.org/10.1186/1471-2105-7-S1-S7
[4] A. J. Butte and I. S. Kohane, "Mutual information relevance networks:
functional genomic clustering using pairwise entropy measurements."
Pac Symp Biocomput, pp. 418-429, 2000.
[5] M. Gustafsson, M. H¨ornquist, and A. Lombardi, "Constructing
and analyzing a large-scale gene-to-gene regulatory network-lassoconstrained
inference and biological validation." IEEE/ACM Trans
Comput Biol Bioinform, vol. 2, no. 3, pp. 254-261, 2005. (Online).
Available: http://dx.doi.org/10.1109/TCBB.2005.35
[6] M. Hecker et al., "Integrative modeling of transcriptional regulation in
response to antirheumatic therapy." BMC Bioinformatics, vol. 10, p. 262,
2009. (Online). Available: http://dx.doi.org/10.1186/1471-2105-10-262
[7] M. Gustafsson, M. H¨ornquist, J. Bj¨orkegren, and J. Tegnr, "Soft integration
of data for reverse engineering," in International Conference on
Systems Biology,2008, 2008, pp. 127-127.
[8] J. Linde, D. Wilson, B. Hube, and R. Guthke, "Regulatory network
modelling of iron acquisition by a fungal pathogen in contact with
epithelial cells." BMC Syst Biol, vol. 4, no. 1, p. 148, 2010. (Online).
Available: http://dx.doi.org/10.1186/1752-0509-4-148
[9] H. Yoon et al., "Coordinated regulation of virulence during
systemic infection of salmonella enterica serovar typhimurium." PLoS
Pathog, vol. 5, no. 2, p. e1000306, Feb 2009. (Online). Available:
http://dx.doi.org/10.1371/journal.ppat.1000306
[10] R. Guthke et al., "Discovery of gene regulatory networks in aspergillus
fumigatus ." Lect Notes Bioinf, vol. 4366, pp. 22-41, 2007.
[11] Y.-C. Wang et al., "Global screening of potential candida albicans
biofilm-related transcription factors via network comparison." BMC
Bioinformatics, vol. 11, p. 53, 2010. (Online). Available: http:
//dx.doi.org/10.1186/1471-2105-11-53
[12] A. M. Huerta, H. Salgado, D. Thieffry, and J. Collado-Vides, "RegulonDB:
a database on transcriptional regulation in Escherichia coli,"
Nucleic Acids Res., vol. 26, no. 1, pp. 55-59, 1998.
[13] R. Edgar, M. Domrachev, and A. E. Lash, "Gene expression omnibus:
Ncbi gene expression and hybridization array data repository." Nucleic
Acids Res, vol. 30, no. 1, pp. 207-210, Jan 2002.
[14] L. S. Wilson et al., "The direct cost and incidence of systemic fungal
infections." Value Health, vol. 5, no. 1, pp. 26-34, 2002.
[15] B. Hube, "From commensal to pathogen: stage- and tissuespecific
gene expression of candida albicans." Curr Opin Microbiol,
vol. 7, no. 4, pp. 336-341, Aug 2004. (Online). Available:
http://dx.doi.org/10.1016/j.mib.2004.06.003
[16] K. Zakikhany et al., "In vivo transcript profiling of candida albicans
identifies a gene essential for interepithelial dissemination." Cell
Microbiol, vol. 9, no. 12, pp. 2938-2954, Dec 2007. (Online).
Available: http://dx.doi.org/10.1111/j.1462-5822.2007.01009.x
[17] W. A. Baumgartner(Jr.) et al., "Manual curation is not sufficient for
annotation of genomic databases." in ISMB/ECCB (Supplement of Bioinformatics),
2007, pp. 41-48.
[18] U. Hahn, J. Wermter, R. Blasczyk, and P. A. Horn, "Text mining:
Powering the database revolution (correspondence)," Nature, vol. 448,
no. 7150, p. 130, 2007.
[19] L. Hirschman, A. S. Yeh, C. Blaschke, and A. Valencia, "Overview of
biocreative: Critical assessment of information extraction for biology,"
BMC Bioinformatics, vol. 6, no. Supplement 1: S1, 2005.
[20] J.-D. Kim et al., "Overview of BioNLP-09 Shared Task on Event
Extraction," in Proceedings BioNLP 2009. Companion Volume: Shared
Task on Event Extraction. Boulder, Colorado, USA, June 4-5, 2009,
2009, pp. 1-9.
[21] C. Rodr´ıguez-Penagos, H. Salgado, I. Mart´ınez-Flores, and J. Collado-
Vides, "Automatic reconstruction of a bacterial regulatory network using
natural language processing," BMC Bioinformatics, vol. 8, no. 293, 2007.
(Online). Available: http://www.biomedcentral.com/1471-2105/8/293
[22] U. Hahn et al., "How feasible and robust is the automatic extraction of
gene regulation events? a cross-method evaluation under lab and real-life
conditions," in Proceedings of the NAACL workshop on BioNLP 2009.
Association for Computational Linguistics, 2009, pp. 37-45.
[23] D. Albrecht, O. Kniemeyer, A. A. Brakhage, and R. Guthke, "Missing
values in gel-based proteomics." Proteomics, vol. 10, no. 6, pp.
1202-1211, Mar 2010. (Online). Available: http://dx.doi.org/10.1002/
pmic.200800576
[24] T. Hastie et al., "Imputing missing data for gene expression arrays,"
1999.
[25] W. Stacklies et al., "pcamethods-a bioconductor package providing
pca methods for incomplete data." Bioinformatics, vol. 23, no. 9, pp.
1164-1167, May 2007. (Online). Available: http://dx.doi.org/10.1093/
bioinformatics/btm069
[26] M. B. Arnaud et al., "Candida genome database,"
http://www.candidagenome.org/.
[27] U. G¨uldener et al., "MPact: the MIPS protein interaction resource
on yeast," Nucleic Acids Research, vol. 34, no. Database issue,
pp. D436-441, Jan. 2006, PMID: 16381906. [Online]. Available:
http://www.ncbi.nlm.nih.gov/pubmed/16381906
[28] G. D. Bader, D. Betel, and C. W. V. Hogue, "Bind: the biomolecular
interaction network database." Nucleic Acids Res, vol. 31, no. 1, pp.
248-250, Jan 2003.
[29] S. Balaji et al., "Comprehensive analysis of combinatorial regulation
using the transcriptional regulatory network of yeast." J Mol Biol,
vol. 360, no. 1, pp. 213-227, Jun 2006. (Online). Available:
http://dx.doi.org/10.1016/j.jmb.2006.04.029
[30] E. Wingender, P. Dietze, H. Karas, and R. Kn¨uppel, "TRANSFAC: a
database on transcription factors and their DNA binding sites," Nucleic
Acids Research, vol. 24, no. 1, pp. 238 -241, Jan. 1996. (Online).
Available: http://nar.oxfordjournals.org/content/24/1/238.abstract
[31] E. Buyko, E. Faessler, J. Wermter, and U. Hahn, "Syntactic simplification
and semantic enrichment - Trimming dependency graphs for event
extraction," Computational Intelligence, in print, 2011.
[32] E. Buyko, E. Faessler, J. Wermter, and U. Hahn, "Event extraction from
trimmed dependency graphs," in Proceedings BioNLP 2009. Companion
Volume: Shared Task on Event Extraction. Boulder, Colorado, USA,
June 4-5, 2009, 2009, pp. 19-27.
[33] R. Tibshirani, "Regression shrinkage and selection via the lasso,"
JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B,
vol. 58, pp. 267-288, 1994. (Online). Available: http://citeseerx.ist.
psu.edu/viewdoc/summary?doi=10.1.1.35.7574
[34] H. Zou, "The adaptive lasso and its oracle properties," Journal
of the American Statistical Association, vol. 101, pp. 1418-1429,
December 2006. (Online). Available: http://ideas.repec.org/a/bes/jnlasa/
v101y2006p1418-1429.html
[35] I. J. B Efron, T Hastie and R. Tibshirani, "Least angle regression," The
Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004.
[36] J. Friedman, T. Hastie, and R. Tibshirani, "Regularization paths for
generalized linear models via coordinate descent," 2009. (Online).
Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.
3333
[37] G. Stolovitzky, D. Monroe, and A. Califano, "Dialogue on reverseengineering
assessment and methods: the dream of high-throughput
pathway inference." Ann N Y Acad Sci, vol. 1115, pp. 1-22, Dec 2007.
(Online). Available: http://dx.doi.org/10.1196/annals.1407.021