Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings
Protein 3D structure prediction has always been an
important research area in bioinformatics. In particular, the
prediction of secondary structure has been a well-studied research
topic. Despite the recent breakthrough of combining multiple
sequence alignment information and artificial intelligence algorithms
to predict protein secondary structure, the Q3 accuracy of various
computational prediction algorithms rarely has exceeded 75%. In a
previous paper [1], this research team presented a rule-based method
called RT-RICO (Relaxed Threshold Rule Induction from Coverings)
to predict protein secondary structure. The average Q3 accuracy on
the sample datasets using RT-RICO was 80.3%, an improvement
over comparable computational methods. Although this demonstrated
that RT-RICO might be a promising approach for predicting
secondary structure, the algorithm-s computational complexity and
program running time limited its use. Herein a parallelized
implementation of a slightly modified RT-RICO approach is
presented. This new version of the algorithm facilitated the testing of
a much larger dataset of 396 protein domains [2]. Parallelized RTRICO
achieved a Q3 score of 74.6%, which is higher than the
consensus prediction accuracy of 72.9% that was achieved for the
same test dataset by a combination of four secondary structure
prediction methods [2].
[1] L. Lee, J. L. Leopold, R. L. Frank and A. M. Maglia, "Protein Secondary
Structure Prediction Using Rule Induction from Coverings",
Proceedings of IEEE Symposium on Computational Intelligence in
Bioinformatics and Computational Biology 2009 (part of IEEE
Symposium Series on Computational Intelligence 2009), Nashville,
Tennessee, USA, pp. 79-86.
[2] J. A. Cuff, and G. Barton, "Evaluation and improvement of multiple
sequence methods for protein secondary structure prediction". Proteins,
34, pp. 508-519, 1999.
[3] B. Rost, "Rising accuracy of protein secondary structure prediction", D.
Chasman, Ed., Protein structure determination, analysis, and modeling
for drug discovery, New York: Dekker, 2003, pp. 207-249.
[4] W. Kabsh and C. Sander, "How good are predictions of protein
secondary structure?", FEBS Letters, 155, pp. 179-182, 1983.
[5] B. Rost and C. Sander, "Prediction of protein secondary structure at
better than 70% accuracy", J. Mol. Biol., 232, pp. 584-599, 1993.
[6] R. D. King and M. J. E. Sternberg, "Identification and application of the
concepts important for accurate and reliable protein secondary structure
prediction", Protein Sci, 1996, 5, pp. 2298-2310.
[7] D. Frishman and P. Argos, "Seventy-five percent accuracy in protein
secondary structure prediction", Proteins, 1997, 27, pp. 329-335.
[8] A. A. Salamov and V. V. Solovyev, "Prediction of protein secondary
structure by combining nearest-neighbor algorithms and multiple
sequence alignments", J Mol Biol, 1995, 247, pp. 11-15.
[9] U. Y. Fadime, Y. O¨zlem, and T. Metin, "Prediction of secondary
structures of proteinsnext term using a two-stage method", Computers &
Chemical Engineering, 2008. 32(1-2), pp. 78-88.
[10] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H.
Weissig, I. N. Shindyalov, and P. E. Bourne, "The Protein Data Bank",
Nucleic Acids Res, 2000, 28(1), pp. 235-42.
[11] P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen,
Assessing the accuracy of prediction algorithms for classification: an
overview. Bioinformatics, 2000. 16(5), pp. 412-24.
[12] C. T. Zhang, and R. Zhang, Q9, a content-balancing accuracy index to
evaluate algorithms of protein secondary structure prediction. Int J
Biochem Cell Biol, 2003. 35(8), pp. 1256-62.
[13] D. T. Jones, "Protein secondary structure prediction based on positionspecific
scoring matrices-, J Mol Biol, 1999, 292(2), pp. 195-202.
[14] K. Bryson, L. J. McGuffin, R. L. Marsden, J. J. Ward, J. S. Sodhi, and D.
T. Jones, "Protein structure prediction servers at University College
London", Nucleic Acids Res, 2005, 33(Web Server issue), pp. W36-8.
[15] S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W.
Miller, and D. J. Lipman, "Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs", Nucleic Acids Res,
1997, 25(17), pp. 3389-402.
[16] J. A. Cuff, and G. J. Barton, "Application of multiple sequence
alignment profiles to improve protein secondary structure prediction",
Proteins, 2000, 40(3), pp. 502-11.
[17] M. Levitt, and C. Chothia, "Structural patterns in globular proteins"
Nature, 1976, 261(5561), pp. 552-8.
[18] A. M. Maglia, J. L. Leopold, and V. R. Ghatti, "Identifying Character
Non-Independence in Phylogenetic Data Using Data Mining
Techniques", Proc. Second Asia-Pacific Bioinformatics Conference
Dunedin, New Zealand, 2004.
[19] J. L. Leopold, A. M. Maglia, M. Thakur, B. Patel, and F. Ercal,
"Identifying Character Non-Independence in Phylogenetic Data Using
Parallelized Rule Induction From Coverings", Data Mining VIII: Data,
Text, and Web Mining and Their Business Applications, WIT
Transactions on Information and Communication Technologies, 2007,
38, pp. 45-54.
[20] A. Andreeva, D. Howorth, J. M. Chandonia, S. E. Brenner, T. J.
Hubbard, C. Chothia C, and A. G. Murzin, "Data growth and its impact
on the SCOP database: new developments", Nucleic Acids Res, 2008,
36(Database issue), pp. D419-25.
[21] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, "SCOP: a
structural classification of proteins database for the investigation of
sequences and structures", J Mol Biol, 1995, 247(4), pp. 536-40.
[22] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H.
Weissig, I. N. Shindyalov, and P. E. Bourne, "The Protein Data Bank",
Nucleic Acids Res, 2000, 28(1), pp. 235-42.
[23] J. L. Klepeis, and C. A. Floudas, "Ab initio prediction of helical
segments in polypeptides", J Comput Chem, 2002, 23(2), pp. 245-66.
[24] J. Han, and M. Kamber, Data Mining: Concepts and Techniques.
Morgan Kaufmann, 2001, pp. 155-157.
[1] L. Lee, J. L. Leopold, R. L. Frank and A. M. Maglia, "Protein Secondary
Structure Prediction Using Rule Induction from Coverings",
Proceedings of IEEE Symposium on Computational Intelligence in
Bioinformatics and Computational Biology 2009 (part of IEEE
Symposium Series on Computational Intelligence 2009), Nashville,
Tennessee, USA, pp. 79-86.
[2] J. A. Cuff, and G. Barton, "Evaluation and improvement of multiple
sequence methods for protein secondary structure prediction". Proteins,
34, pp. 508-519, 1999.
[3] B. Rost, "Rising accuracy of protein secondary structure prediction", D.
Chasman, Ed., Protein structure determination, analysis, and modeling
for drug discovery, New York: Dekker, 2003, pp. 207-249.
[4] W. Kabsh and C. Sander, "How good are predictions of protein
secondary structure?", FEBS Letters, 155, pp. 179-182, 1983.
[5] B. Rost and C. Sander, "Prediction of protein secondary structure at
better than 70% accuracy", J. Mol. Biol., 232, pp. 584-599, 1993.
[6] R. D. King and M. J. E. Sternberg, "Identification and application of the
concepts important for accurate and reliable protein secondary structure
prediction", Protein Sci, 1996, 5, pp. 2298-2310.
[7] D. Frishman and P. Argos, "Seventy-five percent accuracy in protein
secondary structure prediction", Proteins, 1997, 27, pp. 329-335.
[8] A. A. Salamov and V. V. Solovyev, "Prediction of protein secondary
structure by combining nearest-neighbor algorithms and multiple
sequence alignments", J Mol Biol, 1995, 247, pp. 11-15.
[9] U. Y. Fadime, Y. O¨zlem, and T. Metin, "Prediction of secondary
structures of proteinsnext term using a two-stage method", Computers &
Chemical Engineering, 2008. 32(1-2), pp. 78-88.
[10] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H.
Weissig, I. N. Shindyalov, and P. E. Bourne, "The Protein Data Bank",
Nucleic Acids Res, 2000, 28(1), pp. 235-42.
[11] P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen,
Assessing the accuracy of prediction algorithms for classification: an
overview. Bioinformatics, 2000. 16(5), pp. 412-24.
[12] C. T. Zhang, and R. Zhang, Q9, a content-balancing accuracy index to
evaluate algorithms of protein secondary structure prediction. Int J
Biochem Cell Biol, 2003. 35(8), pp. 1256-62.
[13] D. T. Jones, "Protein secondary structure prediction based on positionspecific
scoring matrices-, J Mol Biol, 1999, 292(2), pp. 195-202.
[14] K. Bryson, L. J. McGuffin, R. L. Marsden, J. J. Ward, J. S. Sodhi, and D.
T. Jones, "Protein structure prediction servers at University College
London", Nucleic Acids Res, 2005, 33(Web Server issue), pp. W36-8.
[15] S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W.
Miller, and D. J. Lipman, "Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs", Nucleic Acids Res,
1997, 25(17), pp. 3389-402.
[16] J. A. Cuff, and G. J. Barton, "Application of multiple sequence
alignment profiles to improve protein secondary structure prediction",
Proteins, 2000, 40(3), pp. 502-11.
[17] M. Levitt, and C. Chothia, "Structural patterns in globular proteins"
Nature, 1976, 261(5561), pp. 552-8.
[18] A. M. Maglia, J. L. Leopold, and V. R. Ghatti, "Identifying Character
Non-Independence in Phylogenetic Data Using Data Mining
Techniques", Proc. Second Asia-Pacific Bioinformatics Conference
Dunedin, New Zealand, 2004.
[19] J. L. Leopold, A. M. Maglia, M. Thakur, B. Patel, and F. Ercal,
"Identifying Character Non-Independence in Phylogenetic Data Using
Parallelized Rule Induction From Coverings", Data Mining VIII: Data,
Text, and Web Mining and Their Business Applications, WIT
Transactions on Information and Communication Technologies, 2007,
38, pp. 45-54.
[20] A. Andreeva, D. Howorth, J. M. Chandonia, S. E. Brenner, T. J.
Hubbard, C. Chothia C, and A. G. Murzin, "Data growth and its impact
on the SCOP database: new developments", Nucleic Acids Res, 2008,
36(Database issue), pp. D419-25.
[21] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, "SCOP: a
structural classification of proteins database for the investigation of
sequences and structures", J Mol Biol, 1995, 247(4), pp. 536-40.
[22] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H.
Weissig, I. N. Shindyalov, and P. E. Bourne, "The Protein Data Bank",
Nucleic Acids Res, 2000, 28(1), pp. 235-42.
[23] J. L. Klepeis, and C. A. Floudas, "Ab initio prediction of helical
segments in polypeptides", J Comput Chem, 2002, 23(2), pp. 245-66.
[24] J. Han, and M. Kamber, Data Mining: Concepts and Techniques.
Morgan Kaufmann, 2001, pp. 155-157.
@article{"International Journal of Information, Control and Computer Sciences:52928", author = "Leong Lee and Cyriac Kandoth and Jennifer L. Leopold and Ronald L. Frank", title = "Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings", abstract = "Protein 3D structure prediction has always been an
important research area in bioinformatics. In particular, the
prediction of secondary structure has been a well-studied research
topic. Despite the recent breakthrough of combining multiple
sequence alignment information and artificial intelligence algorithms
to predict protein secondary structure, the Q3 accuracy of various
computational prediction algorithms rarely has exceeded 75%. In a
previous paper [1], this research team presented a rule-based method
called RT-RICO (Relaxed Threshold Rule Induction from Coverings)
to predict protein secondary structure. The average Q3 accuracy on
the sample datasets using RT-RICO was 80.3%, an improvement
over comparable computational methods. Although this demonstrated
that RT-RICO might be a promising approach for predicting
secondary structure, the algorithm-s computational complexity and
program running time limited its use. Herein a parallelized
implementation of a slightly modified RT-RICO approach is
presented. This new version of the algorithm facilitated the testing of
a much larger dataset of 396 protein domains [2]. Parallelized RTRICO
achieved a Q3 score of 74.6%, which is higher than the
consensus prediction accuracy of 72.9% that was achieved for the
same test dataset by a combination of four secondary structure
prediction methods [2].", keywords = "data mining, protein secondary structure prediction,parallelization.", volume = "3", number = "12", pages = "2776-7", }