One-Class Support Vector Machines for Protein-Protein Interactions Prediction

Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.




References:
[1] T. Ito, K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K.
Yamamoto, S. Kuhara, and Y. Sakaki, "Toward a protein-protein
interaction map of the budding yeast: a comprehensive system to
examine two-hybrid interactions in all possible combinations between
the yeast proteins," Proc. Natl. Acad. Sci. USA. 97: 1143-1147, 2000.
[2] P. Uetz, L. Giot, G. Cagney, T.A. Mansfield, R.S. Judson, J.R. Knight,
D. Lockshon, V. Narayan, M. Srinivasan, et al., "A Comprehensive
analysis of protein-protein interactions in Saccharomyces cerevisiae,"
Nature 403:623 627, 2000.
[3] J. R. Newman, E. Wolf, and P. S. Kim, "A computationally directed
screen identifying interacting coiled coils from Saccharomyces
cerevisiae," Proc. Natl. Acad. Sci. U. S. A. 97, 13203-13208, 2000.
[4] H. Lodish, A. Berk, L. Zipursky, P. Matsudaira, D. Baltimore, and J.
Darnell, Molecular cell biology (4th edition). W.H. Freeman, New
York, 2000.
[5] B. Alberts, A. Johnson, J. Lewis, M. Raff, K.Roberts, and P. Walter,
Molecular Biology of the Cell (4th edition). Garland Science, 2002.
[6] P. Uetz and C. S. Vollert, "Protein-Protein Interactions," Encyclopedic
Reference of Genomics and Proteomics in Molecular Medicine
(ERGPMM), Springer Verlag, 2005.
[7] J. R. Bock and D. A. Gough, "Predicting protein-protein interactions
from primary structure," Bioinformatics, vol. 17(5), pp: 455-460, 2001.
[8] Y. Chung, G. Kim, Y. Hwang, and H. Park, "Predicting Protein-Protein
Interactions from One Feature Using SVM," In proceedings of
IEA/AIE-04, pp:50-55, 2004.
[9] S. Dohkan, A. Koike and T. Takagi, "Prediction of protein-protein
interactions using Support Vector Machines," In Proceedings of the
Fourth IEEE Symposium on BioInformatics and BioEngineering
(BIBE2004), Taitung, Taiwan, 576-584, 2004
[10] I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S. M. Kim, and D.
Eisenberg, "DIP, the Database of Interacting Proteins: a research tool for
studying cellular networks of protein interactions," Nucleic Acids
Research, vol. 30(1), pp: 303- 305, 2002.
[11] C. M. Deane, L. Salwinski, I. Xenarios, and D. Eisenberg, "Protein
interactions: two methods for assessment of the reliability of high
throughput observations," Molecular & Cellular Proteomics, vol. 1(5),
pp: 349-56, 2002.
[12] E. M. Phizicky and S. Fields, "Protein-protein interactions: Method for
detection and analysis," Microbiological Reviews, pp.94-123, 1995.
[13] A. Valencia, F. Pazos, "Computational methods for the prediction of
protein interactions," Curr. Opin. Struct. Biol. 12, pp: 368-373, 2002.
[14] M. Pellegrini, E.M. Marcotte, M.J. Thompson, D. Eisenberg, T.O.
Yeates, "Assigning protein functions by comparative genome analysis:
protein phylogenetic profiles," Proc. Natl. Acad. Sci. USA 96, pp: 4285-
4288, 1999.
[15] T. Gaasterland, M.A. Ragan, "Microbial genescapes: phyletic and
functional patterns of ORF distribution among prokaryotes," Microb.
Comp. Genomics 3 pp:199-217, 1998.
[16] J. Tamames, G. Casari, C. Ouzounis, A.Valencia, "Conserved clusters of
functionally related genes in two bacterial genomes," J. Mol. Evol. 44
pp: 66-73, 1997.
[17] A.J. Enright, I. Iliopoulos, N.C. Kyrpides, C.A. Ouzounis, "Protein
interaction maps for complete genomes based on gene fusion events,"
Nature 402 pp:86-90, 1999.
[18] T. Pawson and P. Nash, "Assembly of cell regulatory systems through
protein interaction domains," Science, vol. 300, pp: 445-452, 2003.
[19] W. K. Kim, J. Park, and J. K. Suh, "Large scale statistical prediction of
protein-protein interaction by potentially interacting domain (PID) pair,"
Genome Informatics, vol. 13, pp: 42-50, 2002.
[20] D. S. Han, H. S. Kim, W. H. Jang, S. D. Lee, "PreSPI: A Domain
Combination Based Prediction System for Protein-Protein Interaction,"
Nucleic Acids Research, vol. 32, no. 21, pp: 6312-6320, 2004.
[21] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer.
1995.
[22] B. Schölkopf and A. Smola, Learning with kernelsÔÇösupport vector
machines, regularization, optimization and beyond, Cambridge, MA:
MIT Press, 2002.
[23] K. R. M├╝ller, S. Mika, G. Ratsch, K. Tsuda, and B. Schölkopf, "An
introduction to kernel-based learning algorithms," IEEE Transactions on
Neural Networks, 12(2), 181-201, 2001.
[24] S. Dumais, J. Platt, D. Heckerman, and M. Sahami, "Inductive learning
algorithms and representations for text categorization," In Proceedings
of ACM-CIKM98, Washington, DC (pp. 148-155). 1998.
[25] E. Osuna, R. Freund, and F. Girosi, "Training support vector machines:
An application to face detection," In 1997 Conference on computer
vision and pattern recognition (pp. 130-136). Puerto Rico: IEEE. 1997.
[26] D. Roobaert and V. M. Hulle, "View-based 3d-object recognition with
support vector machines," In 1999 IEEE workshop on neural networks
for signal processing (pp. 77-84). Madison, WI: IEEE. 1999.
[27] J. R. Bock and D. A. Gough, "Predicting protein-protein interactions
from primary structure," Bioinformatics, vol. 17(5), pp: 455-460, 2001.
[28] H. J. Shin , D. H. Eom, S. S. Kim, "One-class support vector machines:
an application in machine fault detection and classification," Computers
and Industrial Engineering, vol. 48 n. 2, pp:395-408, 2005.
[29] S. K. Ng, Z. Zhang, S. H. Tan, and K. Lin, "InterDom: a database of
putative interacting protein domains for validating predicted protein
interactions and complexes," Nucleic Acids Research, vol. 31, pp: 251-
254, 2003.
[30] A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, G. Jones, S.
Khanna, A. Marshall, S.E. Moxon, L.L. Sonnhammer, D.J. Studholme,
C. Yeats, and S.R. Eddy, "The Pfam Protein Families Database," Nucleic
Acids Research Database Issue. 32:D138-D141, 2004.
[31] E. L. Hong, R. Balakrishnan, K.R. Christie, M.C. Costanzo, S.S.
Dwight, S.R. Engel, D.G. Fisk, et al., "Saccharomyces Genome
Database" http://www.yeastgenome.org/, (25th Dec 2005).
[32] N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, D. Barrell, A.
Bateman, D. Binns, et al., "The InterPro Database brings increased
coverage and new features," Nucleic Acids Research, vol. 31, pp: 315-
318, 2003.
[33] C. C. Chang and C. J. Lin, "LIBSVM : a library for support vector
machines," 2001. Software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm.