Introducing Sequence-Order Constraint into Prediction of Protein Binding Sites with Automatically Extracted Templates

Search for a tertiary substructure that geometrically matches the 3D pattern of the binding site of a well-studied protein provides a solution to predict protein functions. In our previous work, a web server has been built to predict protein-ligand binding sites based on automatically extracted templates. However, a drawback of such templates is that the web server was prone to resulting in many false positive matches. In this study, we present a sequence-order constraint to reduce the false positive matches of using automatically extracted templates to predict protein-ligand binding sites. The binding site predictor comprises i) an automatically constructed template library and ii) a local structure alignment algorithm for querying the library. The sequence-order constraint is employed to identify the inconsistency between the local regions of the query protein and the templates. Experimental results reveal that the sequence-order constraint can largely reduce the false positive matches and is effective for template-based binding site prediction.




References:
[1] S. E. Brenner, "A tour of structural genomics," Nature Reviews Genetics,
vol. 2, pp. 801-809, Oct 2001.
[2] J. D. Watson, R. A. Laskowski, and J. M. Thornton, "Predicting protein
function from sequence and structural data," Current Opinion in
Structural Biology, vol. 15, pp. 275-284, Jun 2005.
[3] D. T. H. Chang, C. Y. Chen, W. C. Chung, Y. J. Oyang, H. F. Juan, and H.
C. Huang, "ProteMiner-SSM: a web server for efficient analysis of similar
protein tertiary substructures," Nucleic Acids Research, vol. 32, pp.
W76-W82, Jul 1 2004.
[4] A. Shulman-Peleg, R. Nussinov, and H. J. Wolfson, "Recognition of
functional sites in protein structures," Journal of Molecular Biology, vol.
339, pp. 607-633, Jun 4 2004.
[5] F. Ferre, G. Ausiello, A. Zanzoni, and M. Helmer-Citterich, "Functional
annotation by identification of local surface similarities: a novel tool for
structural genomics," BMC Bioinformatics, vol. 6, p. 194, Aug 2 2005.
[6] J. W. Torrance, G. J. Bartlett, C. T. Porter, and J. M. Thornton, "Using a
library of structural templates to recognise catalytic sites and explore their
evolution in homologous families," Journal of Molecular Biology, vol.
347, pp. 565-581, Apr 1 2005.
[7] C. T. Porter, G. J. Bartlett, and J. M. Thornton, "The Catalytic Site Atlas: a
resource of catalytic sites and residues identified in enzymes using
structural data," Nucleic Acids Research, vol. 32, pp. D129-D133, Jan 1
2004.
[8] J. A. Barker and J. M. Thornton, "An algorithm for constraint-based
structural template matching: application to 3D templates with statistical
analysis," Bioinformatics, vol. 19, pp. 1644-1649, Sep 1 2003.
[9] D. T.-H. Chang, Y.-Z. Weng, J.-H. Lin, M.-J. Hwang, and Y.-J. Oyang,
"Protemot: prediction of protein binding sites with automatically
extracted geometrical templates," Nucleic Acids Research, vol. 34, pp.
W303-W309, 2006.
[10] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H.
Weissig, I. N. Shindyalov, and P. E. Bourne, "The Protein Data Bank,"
Nucleic Acids Research, vol. 28, pp. 235-242, Jan 1 2000.
[11] B. P. Pandey, C. Zhang, X. Z. Yuan, J. Zi, and Y. Q. Zhou, "Protein
flexibility prediction by an all-atom mean-field statistical theory," Protein
Science, vol. 14, pp. 1772-1777, Jul 2005.
[12] I. Bahar, A. R. Atilgan, and B. Erman, "Direct evaluation of thermal
fluctuations in proteins using a single-parameter harmonic potential,"
Folding & Design, vol. 2, pp. 173-181, 1997.
[13] R. A. Laskowski, V. V. Chistyakov, and J. M. Thornton, "PDBsum more:
new summaries and analyses of the known 3D structures of proteins and
nucleic acids," Nucleic Acids Research, vol. 33, pp. D266-D268, Jan 12005.
[14] Y. J. Oyang, S. C. Hwang, Y. Y. Ou, C. Y. Chen, and Z. W. Chen, "Data
classification with radial basis function networks based on a novel kernel
density estimation algorithm," IEEE Transactions on Neural Networks,vol. 16, pp. 225-236, Jan 2005.
[15] Y.-J. Oyang, D. T.-H. Chang, Y.-Y. Ou, H.-G. Hung, C.-P. Wu, and C.-Y.
Chen, "Supervised Machine Learning with a Novel Kernel Density Estimator," 2007, p. arXiv:stat.ML/0709.2760.
[16] H. J. Wolfson and I. Rigoutsos, "Geometric hashing: An overview," Ieee
Computational Science & Engineering, vol. 4, pp. 10-21, Oct-Dec 1997.
[17] C. A. Orengo and W. R. Taylor, "SSAP: Sequential structure alignment
program for protein structure comparison," Computer Methods for
Macromolecular Sequence Analysis, vol. 266, pp. 617-635, 1996.
[18] X. Pennec and N. Ayache, "A geometric algorithm to find small but
highly similar 3D substructures in proteins," Bioinformatics, vol. 14, pp.
516-522, 1998.
[19] N. S. Boutonnet, M. J. Rooman, M. E. Ochagavia, J. Richelle, and S. J.
Wodak, "Optimal Protein-Structure Alignments by Multiple Linkage
Clustering - Application to Distantly Related Proteins," Protein
Engineering, vol. 8, pp. 647-662, Jul 1995.
[20] D. E. Krane and M. L. Raymer, Fundamental concepts of bioinformatics.
San Francisco: Benjamin Cummings, 2002.
[21] S. F. Altschul, "Amino-Acid Substitution Matrices from an Information
Theoretic Perspective," Journal of Molecular Biology, vol. 219, pp.
555-565, Jun 5 1991.
[22] Y. Zhang and J. Skolnick, "Scoring function for automated assessment of
protein structure template quality," Proteins-Structure Function and
Bioinformatics, vol. 57, pp. 702-710, Dec 1 2004.
[23] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to
Algorithms, Second Edition: The MIT Press, 2001.
[24] A. Andreeva, D. Howorth, J. M. Chandonia, S. E. Brenner, T. J. P.
Hubbard, C. Chothia, and A. G. Murzin, "Data growth and its impact on
the SCOP database: new developments," Nucleic Acids Research, vol. 36,
pp. D419-D425, Jan 2008.
[25] S. B. Needleman and C. D. Wunsch, "A general method applicable to the
search for similarities in the amino acid sequence of two proteins," J Mol
Biol, vol. 48, pp. 443-453, 1970.