Abstract: In this study, a high accuracy protein-protein interaction
prediction method is developed. The importance of the proposed
method is that it only uses sequence information of proteins while
predicting interaction. The method extracts phylogenetic profiles of
proteins by using their sequence information. Combining the phylogenetic
profiles of two proteins by checking existence of homologs
in different species and fitting this combined profile into a statistical
model, it is possible to make predictions about the interaction status
of two proteins.
For this purpose, we apply a collection of pattern recognition
techniques on the dataset of combined phylogenetic profiles of protein
pairs. Support Vector Machines, Feature Extraction using ReliefF,
Naive Bayes Classification, K-Nearest Neighborhood Classification,
Decision Trees, and Random Forest Classification are the methods
we applied for finding the classification method that best predicts
the interaction status of protein pairs. Random Forest Classification
outperformed all other methods with a prediction accuracy of 76.93%
Abstract: A New features are extracted and compared to
improve the prediction of protein-protein interactions. The basic idea
is to select and use the best set of features from the Tensor matrices
that are produced by the frequency vectors of the protein sequences.
Three set of features are compared, the first set is based on the
indices that are the most common in the interacting proteins, the
second set is based on the indices that tend to be common in the
interacting and non-interacting proteins, and the third set is
constructed by using random indices. Moreover, three encoding
strategies are compared; that are based on the amino asides polarity,
structure, and chemical properties. The experimental results indicate
that the highest accuracy can be obtained by using random indices
with chemical properties encoding strategy and support vector
machine.