Abstract: To understand life as biological system, evolutionary
understanding is indispensable. Protein interactions data are rapidly
accumulating and are suitable for system-level evolutionary analysis.
We have analyzed yeast protein interaction network by both
mathematical and biological approaches. In this poster presentation,
we inferred the evolutionary birth periods of yeast proteins by
reconstructing phylogenetic profile. It has been thought that hub
proteins that have high connection degree are evolutionary old. But
our analysis showed that hub proteins are entirely evolutionary new.
We also examined evolutionary processes of protein complexes. It
showed that member proteins of complexes were tend to have
appeared in the same evolutionary period. Our results suggested that
protein interaction network evolved by modules that form the
functional unit. We also reconstructed standardized phylogenetic trees
and calculated evolutionary rates of yeast proteins. It showed that
there is no obvious correlation between evolutionary rates and
connection degrees of yeast proteins.
Abstract: In this study, a high accuracy protein-protein interaction
prediction method is developed. The importance of the proposed
method is that it only uses sequence information of proteins while
predicting interaction. The method extracts phylogenetic profiles of
proteins by using their sequence information. Combining the phylogenetic
profiles of two proteins by checking existence of homologs
in different species and fitting this combined profile into a statistical
model, it is possible to make predictions about the interaction status
of two proteins.
For this purpose, we apply a collection of pattern recognition
techniques on the dataset of combined phylogenetic profiles of protein
pairs. Support Vector Machines, Feature Extraction using ReliefF,
Naive Bayes Classification, K-Nearest Neighborhood Classification,
Decision Trees, and Random Forest Classification are the methods
we applied for finding the classification method that best predicts
the interaction status of protein pairs. Random Forest Classification
outperformed all other methods with a prediction accuracy of 76.93%