Proteins Length and their Phenotypic Potential

Mendelian Disease Genes represent a collection of single points of failure for the various systems they constitute. Such genes have been shown, on average, to encode longer proteins than 'non-disease' proteins. Existing models suggest that this results from the increased likeli-hood of longer genes undergoing mutations. Here, we show that in saturated mutagenesis experiments performed on model organisms, where the likelihood of each gene mutating is one, a similar relationship between length and the probability of a gene being lethal was observed. We thus suggest an extended model demonstrating that the likelihood of a mutated gene to produce a severe phenotype is length-dependent. Using the occurrence of conserved domains, we bring evidence that this dependency results from a correlation between protein length and the number of functions it performs. We propose that protein length thus serves as a proxy for protein cardinality in different networks required for the organism's survival and well-being. We use this example to argue that the collection of Mendelian Disease Genes can, and should, be used to study the rules governing systems vulnerability in living organisms.





References:
[1] Botstein, D. and Risch, N. (2003) Discovering genotypes underlying
human phenotypes: past successes for mendelian disease, future
approaches for complex disease, Nat Genet, 33 Suppl, pp 228-237.
[2] Lopez-Bigas, N. and Ouzounis, C.A. (2004) Genome-wide identification
of genes likely to be involved in human genetic disease, Nucleic Acids
Res., 32, pp. 3108-3114.
[3] Kondrashov, F.A., Ogurtsov, A.Y. and Kondrashov, A.S. (2004)
Bioinformatical assay of human gene morbidity, Nucleic Acids Res., 32,
pp. 1731-1737.
[4] Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. and Pickard, B.S.
(2005) Speeding disease gene discovery by sequence based candidate
prioritization, BMC Bioinformatics, 6, pp. 55-88.
[5] Jimenez-Sanchez, G., Childs, B. and Valle, D. (2001) Human disease
genes, Nature, 409, pp. 853-855.
[6] Oti, M., Snel, B., Huynen, M.A. and Brunner, H.G. (2006) Predicting
disease genes using protein-protein interactions, J Med Genet. 43, pp.
691-8.
[7] Perez-Iratxeta, C., Bork, P. and Andrade, M.A. (2002) Association of
genes to genetically inherited diseases using data mining, Nat Genet, 31,
pp. 316-319.
[8] Turner, F.S., Clutterbuck, D.R. and Semple, C.A. (2003) POCUS:
mining genomic sequence annotation to predict disease genes, Genome
Biol, 4, pp. R75.
[9] Seringhaus, M., Paccanaro, A., Borneman, A., Snyder, M. and Gerstein,
M. (2006) Predicting essential genes in fungal genomes. Genome Res.,
16, pp 1126-1135
[10] Lopez-Bigas, N., Audit, B., Ouzounis, C., Parra, G. and Guigo, R.
(2005) Are splicing mutations the most frequent cause of hereditary
disease?, FEBS Lett, 579, pp. 1900-1903.
[11] Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A. and McKusick,
V.A. (2005) Online Mendelian Inheritance in Man (OMIM), a
knowledgebase of human genes and genetic disorders. Nucleic Acids
Res., pp. D514-517.
[12] Drysdale, R. and The FlyBase Consortium, (2008). FlyBase : a database
for the Drosophila research community. Methods Molec. Biol. 420, pp
45-59
[13] Chen, N. et. al (2005) WormBase: a comprehensive data resource for
Caenorhabditis biology and genomics, Nucleic Acids Res, 33, pp. D383-
389.
[14] Cherry, J.M., Adler, C., Ball, C., Chervitz, S.A., Dwight, S.S., Hester,
E.T., Jia, Y., Juvik, G., Roe, T., Schroeder, M., Weng, S. and Botstein,
D. (2006) SGD: Saccharomyces Genome Database. Nucleic Acids Res.
26, pp. 73-79.
[15] Ihaka, R. and Gentleman, R. (1996) R: A language for data analysis and
graphics, Journal of Computational and Graphical Statistics 5, pp. 299-
314.
[16] Karlin, S., Chen, C., Gentles, A.j. and Cleary, M. (2002) Associations
between human disease genes and overlapping gene groups and multiple
amino acid runs. Proc. Nat. Acad Sci 99, pp. 17008-17013
[17] Jeong, H., Mason, S.P., Barabási, A..L. and Oltvai, Z.N. (2001) Lethality
and centrality in protein networks. Nature 411, pp. 41-42.
[18] Batada, N.N., Hurst, L.D. and Tyers, M. (2006) Evolutionary and
Physiological Importance of Hub Proteins. PLoS Comput Biol 2, pp. e88.