Protein Graph Partitioning by Mutually Maximization of cycle-distributions

The classification of the protein structure is commonly not performed for the whole protein but for structural domains, i.e., compact functional units preserved during evolution. Hence, a first step to a protein structure classification is the separation of the protein into its domains. We approach the problem of protein domain identification by proposing a novel graph theoretical algorithm. We represent the protein structure as an undirected, unweighted and unlabeled graph which nodes correspond the secondary structure elements of the protein. This graph is call the protein graph. The domains are then identified as partitions of the graph corresponding to vertices sets obtained by the maximization of an objective function, which mutually maximizes the cycle distributions found in the partitions of the graph. Our algorithm does not utilize any other kind of information besides the cycle-distribution to find the partitions. If a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. As stop criterion, we calculate numerically a significance level which indicates the stability of the predicted partition against a random rewiring of the protein graph. Hence, our algorithm terminates automatically its iterative application. We present results for one and two domain proteins and compare our results with the manually assigned domains by the SCOP database and differences are discussed.




References:
[1] H. Kopka and P. W. Daly, A Guide to LATEX, 3rd ed. Harlow, England:
Addison-Wesley, 1999.
[2] A. Andreeva, D. Howorth, S.E. Brenner, T.J. Hubbard, C. Chothia and
A.G. Murzin, SCOP database in 2004: refinements integrate structure and
sequence fmily data, Nucleic Acids Res., 32:D226-229, 2004.
[3] F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer Jr, M.D. Brice,
J.R. Rodgers, O. Kennard, T. Shimanouchi, M. Tasumi, The Protein Data
Bank: A computer-based archival file for macromolecular structures. J.
Mol. Biol., 112:535-542, 1977.
[4] R.F. Doolittle, The multiplicity of domains in proteins, Annu. Rev.
Biochem., 64:287-314, 1995.
[5] J-t. Guo, D. Xu, D. Kim and Y.Xu, Improving the performance of
DomainParser for structural domain partition using neural network, Nucl.
Acids Res., 31(3)944-952, 2003.
[6] L. Holm and C. Sander, Dictionary of recurrent domains in protein
structures, Proteins: Structure, Function and Genetics, 33:88-96, 1998.
[7] P.J. Kraulis, Molscript: A program to produce both detailed and schematic
plots of protein sturctures, J. of Appl. Chrystallograph, 24:946-950, 1991.
[8] N.J. Mulder et al., InterPro, progress and status in 2005, Nucleic Acids
Res., 33:D201-205, 2005.
[9] A.G. Murzin, S.E. Brenner, T. Hubbard and C. Chothia, SCOP: a structural
classification of proteins database for the investigation of sequences
and structures, J. Mol. Biol., 247:536-540, 1995.
[10] D.C. Phillips, The dree-dimensional structure of an enzyme molecule,
Sci. Am., 215:78-90, 1966.
[11] A.S. Siddiqui and G.J. Barton, Contious and discontinuous domains:
An algorithm for the automatic generation of reliable protein domain
definietions, Protein Science, 4:872-884, 1995.
[12] G.D. Rose, Hierachic organization of domains in globular proteins, J.
Mol. Biol., 134(3):447-470, 1979.
[13] M.G. Rossmann and A. Liljas, Recognition of structural domains in
globular proteins, J. Mol. Biol., 85:177-181, 1974
[14] W.R. Taylor, Protein structural domain identification, Protein Eng.,
12(3):203-216, 1999.
[15] S. Veretnik, P.E. Bourne, N.N. Alexandrov and I.N Shindyalov, Toward
consistent assignment of structural domains in proteins, J. Mol. Biol.,
339:647-678, 2004.
[16] D. Vitkup, E. Melamud, J. Moult and C. Sander, Completness in
structural genomics, Nat. Struct. Biol., 8(6):559-566, 2001.
[17] D. Wetlaufer, Nucleation, rapid folding, and globular intrachain regions
in proteins, Proc. Natl. Acad. Sci., 70:697-701, 1973
[18] Y. Xu, D. Xu and H.N. Gabow, Protein domain decomposition using a
graph-theoretic apprach, Bioinformatics, 16(12):1091-1104, 2000.