Improving Protein-Protein Interaction Prediction by Using Encoding Strategies and Random Indices

A New features are extracted and compared to improve the prediction of protein-protein interactions. The basic idea is to select and use the best set of features from the Tensor matrices that are produced by the frequency vectors of the protein sequences. Three set of features are compared, the first set is based on the indices that are the most common in the interacting proteins, the second set is based on the indices that tend to be common in the interacting and non-interacting proteins, and the third set is constructed by using random indices. Moreover, three encoding strategies are compared; that are based on the amino asides polarity, structure, and chemical properties. The experimental results indicate that the highest accuracy can be obtained by using random indices with chemical properties encoding strategy and support vector machine.

Authors:



References:
[1] H. Chua, W. Hugo, G. Liu, X. Li, L. Wong and S. Ng, "A probabilistic
graph-theoretic approach to integrate multiple predictions for the
protein-protein subnetwork prediction challenge," Annals of the New
York Academy of Sciences, vol. 1158, pp 224-233, 2009.
[2] X. Ren and J. Xia, "Prediction of Protein-Protein Interaction Sites by
Using Autocorrelation Descriptor and Support Vector Machine,"
Advanced Intelligent Computing Theories and Applications. With
Aspects of Artificial Intelligence, Lecture Notes in Computer Science,
vol. 6216, pp. 76-82, 2010.
[3] L. Salwinski, C.S. Miller, A.J. Smith, F.K. Pettit, J.U. Bowie and D.
Eisenberg, "The Database of Interacting Proteins," NAR vol.
32,(Database issue), D449-51, 2004.
[4] P. Pagel, S. Kovac, M. Oesterheld, B. Brauner, I. Dunger-Kaltenbach, G.
Frishman, C. Montrone, P. Mark, V. St├╝mpflen, H.W. Mewes, A. Ruepp
and D. Frishman, "The MIPS mammalian protein-protein interaction
database," Bioinformatics vol. 21, no. 6, pp. 832-834; 2005.
[5] L. J. Jensen, M. Kuhn, M. Stark, S. Chaffron, C. Creevey, J. Muller, T.
Doerks, P. Julien, A. Roth, M. Simonovic, P. Bork and cC. von Mering,
"STRING 8-a global view on proteins and their functional interactions in
630 organisms," Nucleic Acids Res vol. 37 Database: D412-D416, 2009.
[6] R. Jansen, H. Yu, D. Greenbaum, Y. Kluger, N. Krogan, S. Chung, A.
Emili, M. Snyder, J. Greenblatt and M. Gerstein, "A Bayesian networks
approach for predicting protein-protein interactions from genomic data,"
Science, vol 302, pp. 449-453, 2003.
[7] V. Zhang, S. Wong, O. King and F. Roth, Predicting, "co-complexed
protein pairs using genomic and proteomic data integration," BMC
Bioinformatics, vol. 5, no. 1, 38, 2004.
[8] Y. Qi, J. Klein-Seetharaman and Z. Bar-Joseph,"Random forest
similarity for protein-protein interaction prediction from multiple
sources," Pac Symp Biocomput, pp. 531-542, 2005.
[9] Y. Qi, J. Klein-Seetharaman and Z. Bar-Joseph, "A mixture of feature
experts approach for protein-protein interaction prediction,", BMC
Bioinformatics, vol. 8 (S10):S6, 2007 [Online]. Available:
http://www.biomedcentral.com/1471-2105/8/S10/S6.
[10] M. Li, L. Lin, X. Wang and T. Liu, "Protein-protein interaction site
prediction based on conditional random fields," Bioinformatics, vol. 23,
no. 5, pp. 597-604, 2007.
[11] J. Espadaler, O. Romero-Isart, R. Jackson and B. Oliva, "Prediction of
protein-protein interactions using distant conservation of sequence
patterns and structure relationships," Bioinformatics, vol 21, no.16, pp.
3360-3368, 2005.
[12] B. Wang, L. Sheng Ge, D. Huang and H. Wong, " prediction of
protein-protein interacting sites by combining SVM algorithm with
Bayesian methods, " Proceedings of the Third International Conference
on Natural Computation, vol. 02, pp. 329-333, 2007.
[13] Y. Wang, J. Wang, Z. Yang and N. Deng, " prediction of protein-protein
interaction based only on coding sequences," The Third International
Symposium on Optimization and Systems Biology (OSB-09), pp. 151-
158, September 20-22, 2009.
[14] A. Bakar, J. Taheri and A. Zomaya, "Fuzzy systems modeling for
protein-protein interaction prediction in Saccharomyces cerevisie," 18th
World IMACS / MODSIM Congress, Cairns, Australia July 13-17, 2009.
[15] O. N. Yaveroglu and T. Can, "Predicting Protein-Protein Interactions
from Protein Sequences Using Phylogenetic Profiles," in Proceedings of
the International Conference on Bioinformatics, Computational and
Systems Biology (ICBCSB'09), Singapore, World Academy of Science,
Engineering and Technology, vol. 56 pp. 241-247. June 2009.
[16] J. W. Shen, J. Zhang, X. Luo, W. Zhu, K. Yu, K. Chen, Y. Li and H.
Jiang, "Predicting protein-protein interactions based only on sequences
information," Proc Natl Acad Sci USA, vol. 104, no. 11, pp 4337-4341,
2007.
[17] K.C. Timberlake, " The chemistry of life," in Chemistry, 5th Edition,
Haper-Collins Publishers Inc, NY, 1992.
[18] J. Koolman, K.H. Rohm, Colour Atlas of Biochemistry, Thieme,
Stuttgart, 1996.
[19] E. Al-Daoud, "Integration of Support Vector Machine and Bayesian
Neural Network for Data Mining and Classification," World Academy of
Science, Engineering and Technology vol. 64 pp. 202 207, 2010.
[20] C.J.Shin, S.Wong, M.J. Davis and M.A. Ragan, "Protein-protein
interaction as a predictor of subcellular location," BMC Systems Biology
vol. 3, no. 28, 2009.