Protein Graph Partitioning by Mutually Maximization of cycle-distributions
The classification of the protein structure is commonly
not performed for the whole protein but for structural domains, i.e.,
compact functional units preserved during evolution. Hence, a first
step to a protein structure classification is the separation of the
protein into its domains. We approach the problem of protein domain
identification by proposing a novel graph theoretical algorithm. We
represent the protein structure as an undirected, unweighted and
unlabeled graph which nodes correspond the secondary structure
elements of the protein. This graph is call the protein graph. The
domains are then identified as partitions of the graph corresponding
to vertices sets obtained by the maximization of an objective function,
which mutually maximizes the cycle distributions found in the
partitions of the graph. Our algorithm does not utilize any other kind
of information besides the cycle-distribution to find the partitions. If
a partition is found, the algorithm is iteratively applied to each of
the resulting subgraphs. As stop criterion, we calculate numerically
a significance level which indicates the stability of the predicted
partition against a random rewiring of the protein graph. Hence,
our algorithm terminates automatically its iterative application. We
present results for one and two domain proteins and compare our
results with the manually assigned domains by the SCOP database
and differences are discussed.
[1] H. Kopka and P. W. Daly, A Guide to LATEX, 3rd ed. Harlow, England:
Addison-Wesley, 1999.
[2] A. Andreeva, D. Howorth, S.E. Brenner, T.J. Hubbard, C. Chothia and
A.G. Murzin, SCOP database in 2004: refinements integrate structure and
sequence fmily data, Nucleic Acids Res., 32:D226-229, 2004.
[3] F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer Jr, M.D. Brice,
J.R. Rodgers, O. Kennard, T. Shimanouchi, M. Tasumi, The Protein Data
Bank: A computer-based archival file for macromolecular structures. J.
Mol. Biol., 112:535-542, 1977.
[4] R.F. Doolittle, The multiplicity of domains in proteins, Annu. Rev.
Biochem., 64:287-314, 1995.
[5] J-t. Guo, D. Xu, D. Kim and Y.Xu, Improving the performance of
DomainParser for structural domain partition using neural network, Nucl.
Acids Res., 31(3)944-952, 2003.
[6] L. Holm and C. Sander, Dictionary of recurrent domains in protein
structures, Proteins: Structure, Function and Genetics, 33:88-96, 1998.
[7] P.J. Kraulis, Molscript: A program to produce both detailed and schematic
plots of protein sturctures, J. of Appl. Chrystallograph, 24:946-950, 1991.
[8] N.J. Mulder et al., InterPro, progress and status in 2005, Nucleic Acids
Res., 33:D201-205, 2005.
[9] A.G. Murzin, S.E. Brenner, T. Hubbard and C. Chothia, SCOP: a structural
classification of proteins database for the investigation of sequences
and structures, J. Mol. Biol., 247:536-540, 1995.
[10] D.C. Phillips, The dree-dimensional structure of an enzyme molecule,
Sci. Am., 215:78-90, 1966.
[11] A.S. Siddiqui and G.J. Barton, Contious and discontinuous domains:
An algorithm for the automatic generation of reliable protein domain
definietions, Protein Science, 4:872-884, 1995.
[12] G.D. Rose, Hierachic organization of domains in globular proteins, J.
Mol. Biol., 134(3):447-470, 1979.
[13] M.G. Rossmann and A. Liljas, Recognition of structural domains in
globular proteins, J. Mol. Biol., 85:177-181, 1974
[14] W.R. Taylor, Protein structural domain identification, Protein Eng.,
12(3):203-216, 1999.
[15] S. Veretnik, P.E. Bourne, N.N. Alexandrov and I.N Shindyalov, Toward
consistent assignment of structural domains in proteins, J. Mol. Biol.,
339:647-678, 2004.
[16] D. Vitkup, E. Melamud, J. Moult and C. Sander, Completness in
structural genomics, Nat. Struct. Biol., 8(6):559-566, 2001.
[17] D. Wetlaufer, Nucleation, rapid folding, and globular intrachain regions
in proteins, Proc. Natl. Acad. Sci., 70:697-701, 1973
[18] Y. Xu, D. Xu and H.N. Gabow, Protein domain decomposition using a
graph-theoretic apprach, Bioinformatics, 16(12):1091-1104, 2000.
[1] H. Kopka and P. W. Daly, A Guide to LATEX, 3rd ed. Harlow, England:
Addison-Wesley, 1999.
[2] A. Andreeva, D. Howorth, S.E. Brenner, T.J. Hubbard, C. Chothia and
A.G. Murzin, SCOP database in 2004: refinements integrate structure and
sequence fmily data, Nucleic Acids Res., 32:D226-229, 2004.
[3] F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer Jr, M.D. Brice,
J.R. Rodgers, O. Kennard, T. Shimanouchi, M. Tasumi, The Protein Data
Bank: A computer-based archival file for macromolecular structures. J.
Mol. Biol., 112:535-542, 1977.
[4] R.F. Doolittle, The multiplicity of domains in proteins, Annu. Rev.
Biochem., 64:287-314, 1995.
[5] J-t. Guo, D. Xu, D. Kim and Y.Xu, Improving the performance of
DomainParser for structural domain partition using neural network, Nucl.
Acids Res., 31(3)944-952, 2003.
[6] L. Holm and C. Sander, Dictionary of recurrent domains in protein
structures, Proteins: Structure, Function and Genetics, 33:88-96, 1998.
[7] P.J. Kraulis, Molscript: A program to produce both detailed and schematic
plots of protein sturctures, J. of Appl. Chrystallograph, 24:946-950, 1991.
[8] N.J. Mulder et al., InterPro, progress and status in 2005, Nucleic Acids
Res., 33:D201-205, 2005.
[9] A.G. Murzin, S.E. Brenner, T. Hubbard and C. Chothia, SCOP: a structural
classification of proteins database for the investigation of sequences
and structures, J. Mol. Biol., 247:536-540, 1995.
[10] D.C. Phillips, The dree-dimensional structure of an enzyme molecule,
Sci. Am., 215:78-90, 1966.
[11] A.S. Siddiqui and G.J. Barton, Contious and discontinuous domains:
An algorithm for the automatic generation of reliable protein domain
definietions, Protein Science, 4:872-884, 1995.
[12] G.D. Rose, Hierachic organization of domains in globular proteins, J.
Mol. Biol., 134(3):447-470, 1979.
[13] M.G. Rossmann and A. Liljas, Recognition of structural domains in
globular proteins, J. Mol. Biol., 85:177-181, 1974
[14] W.R. Taylor, Protein structural domain identification, Protein Eng.,
12(3):203-216, 1999.
[15] S. Veretnik, P.E. Bourne, N.N. Alexandrov and I.N Shindyalov, Toward
consistent assignment of structural domains in proteins, J. Mol. Biol.,
339:647-678, 2004.
[16] D. Vitkup, E. Melamud, J. Moult and C. Sander, Completness in
structural genomics, Nat. Struct. Biol., 8(6):559-566, 2001.
[17] D. Wetlaufer, Nucleation, rapid folding, and globular intrachain regions
in proteins, Proc. Natl. Acad. Sci., 70:697-701, 1973
[18] Y. Xu, D. Xu and H.N. Gabow, Protein domain decomposition using a
graph-theoretic apprach, Bioinformatics, 16(12):1091-1104, 2000.
@article{"International Journal of Medical, Medicine and Health Sciences:60597", author = "Frank Emmert Streib", title = "Protein Graph Partitioning by Mutually Maximization of cycle-distributions", abstract = "The classification of the protein structure is commonly
not performed for the whole protein but for structural domains, i.e.,
compact functional units preserved during evolution. Hence, a first
step to a protein structure classification is the separation of the
protein into its domains. We approach the problem of protein domain
identification by proposing a novel graph theoretical algorithm. We
represent the protein structure as an undirected, unweighted and
unlabeled graph which nodes correspond the secondary structure
elements of the protein. This graph is call the protein graph. The
domains are then identified as partitions of the graph corresponding
to vertices sets obtained by the maximization of an objective function,
which mutually maximizes the cycle distributions found in the
partitions of the graph. Our algorithm does not utilize any other kind
of information besides the cycle-distribution to find the partitions. If
a partition is found, the algorithm is iteratively applied to each of
the resulting subgraphs. As stop criterion, we calculate numerically
a significance level which indicates the stability of the predicted
partition against a random rewiring of the protein graph. Hence,
our algorithm terminates automatically its iterative application. We
present results for one and two domain proteins and compare our
results with the manually assigned domains by the SCOP database
and differences are discussed.", keywords = "Graph partitioning, unweighted graph, protein domains.", volume = "1", number = "8", pages = "497-5", }