Abstract: The inhibition of SH2 domain regulated protein-protein interactions is an attractive target for developing an effective chemotherapeutic approach in the treatment of disease. Molecular simulation is a useful tool for developing new drugs and for studying molecular recognition. In this study, we searched potential drug compounds for the inhibition of SH2 domain by performing structural similarity search in PubChem Compound Database. A total of 37 compounds were screened from the database, and then we used the LibDock docking program to evaluate the inhibition effect. The best three compounds (AP22408, CID 71463546 and CID 9917321) were chosen for MD simulations after the LibDock docking. Our results show that the compound CID 9917321 can produce a more stable protein-ligand complex compared to other two currently known inhibitors of Src SH2 domain. The compound CID 9917321 may be useful for the inhibition of SH2 domain based on these computational results. Subsequently experiments are needed to verify the effect of compound CID 9917321 on the SH2 domain in the future studies.
Abstract: In Arabidopsis, several genes encoding proteins with ankyrin repeats and transmembrane domains (AtANKTM) have been identified as mediators of biotic and abiotic stress responses. It has been known that the expression of an AtANKTM gene, At2g24600, is induced in response to abiotic stress and that there are four splicing variants derived from this locus. In this study, by RT-PCR and sequencing analysis, an unknown splicing variant of the At2g24600 transcript was identified. Based on differences in the predicted amino acid sequences, the five splicing variants are divided into three groups. The three predicted proteins are highly homologous, yet have different numbers of ankyrinrepeats and transmembrane domains. It is generally considered that ankyrin repeats mediate protein-protein interaction and that the number oftransmembrane domains affects membrane topology of proteins. The protein variants derived from the At2g24600 locus may have different molecular functions each other.
Abstract: Protein-protein interactions (PPI) play a crucial role in many biological processes such as cell signalling, transcription, translation, replication, signal transduction, and drug targeting, etc. Structural information about protein-protein interaction is essential for understanding the molecular mechanisms of these processes. Structures of protein-protein complexes are still difficult to obtain by biophysical methods such as NMR and X-ray crystallography, and therefore protein-protein docking computation is considered an important approach for understanding protein-protein interactions. However, reliable prediction of the protein-protein complexes is still under way. In the past decades, several grid-based docking algorithms based on the Katchalski-Katzir scoring scheme were developed, e.g., FTDock, ZDOCK, HADDOCK, RosettaDock, HEX, etc. However, the success rate of protein-protein docking prediction is still far from ideal. In this work, we first propose a more practical measure for evaluating the success of protein-protein docking predictions,the rate of first success (RFS), which is similar to the concept of mean first passage time (MFPT). Accordingly, we have assessed the ZDOCK bound and unbound benchmarks 2.0 and 3.0. We also createda new benchmark set for protein-protein docking predictions, in which the complexes have experimentally determined binding affinity data. We performed free energy calculation based on the solution of non-linear Poisson-Boltzmann equation (nlPBE) to improve the binding mode prediction. We used the well-studied thebarnase-barstarsystem to validate the parameters for free energy calculations. Besides,thenlPBE-based free energy calculations were conducted for the badly predicted cases by ZDOCK and ZRANK. We found that direct molecular mechanics energetics cannot be used to discriminate the native binding pose from the decoys.Our results indicate that nlPBE-based calculations appeared to be one of the promising approaches for improving the success rate of binding pose predictions.
Abstract: Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.
Abstract: This paper describes a novel approach for deriving
modules from protein-protein interaction networks, which combines
functional information with topological properties of the network.
This approach is based on weighted clustering coefficient, which
uses weights representing the functional similarities between the
proteins. These weights are calculated according to the semantic
similarity between the proteins, which is based on their Gene
Ontology terms. We recently proposed an algorithm for identification
of functional modules, called SWEMODE (Semantic WEights for
MODule Elucidation), that identifies dense sub-graphs containing
functionally similar proteins. The rational underlying this approach is
that each module can be reduced to a set of triangles (protein triplets
connected to each other). Here, we propose considering semantic
similarity weights of all triangle-forming edges between proteins. We
also apply varying semantic similarity thresholds between
neighbours of each node that are not neighbours to each other (and
hereby do not form a triangle), to derive new potential triangles to
include in module-defining procedure. The results show an
improvement of pure topological approach, in terms of number of
predicted modules that match known complexes.
Abstract: Proteomics is one of the largest areas of research for
bioinformatics and medical science. An ambitious goal of proteomics
is to elucidate the structure, interactions and functions of all proteins
within cells and organisms. Predicting Protein-Protein Interaction
(PPI) is one of the crucial and decisive problems in current research.
Genomic data offer a great opportunity and at the same time a lot of
challenges for the identification of these interactions. Many methods
have already been proposed in this regard. In case of in-silico
identification, most of the methods require both positive and negative
examples of protein interaction and the perfection of these examples
are very much crucial for the final prediction accuracy. Positive
examples are relatively easy to obtain from well known databases. But
the generation of negative examples is not a trivial task. Current PPI
identification methods generate negative examples based on some
assumptions, which are likely to affect their prediction accuracy.
Hence, if more reliable negative examples are used, the PPI prediction
methods may achieve even more accuracy. Focusing on this issue, a
graph based negative example generation method is proposed, which
is simple and more accurate than the existing approaches. An
interaction graph of the protein sequences is created. The basic
assumption is that the longer the shortest path between two
protein-sequences in the interaction graph, the less is the possibility of
their interaction. A well established PPI detection algorithm is
employed with our negative examples and in most cases it increases
the accuracy more than 10% in comparison with the negative pair
selection method in that paper.
Abstract: In this study, a high accuracy protein-protein interaction
prediction method is developed. The importance of the proposed
method is that it only uses sequence information of proteins while
predicting interaction. The method extracts phylogenetic profiles of
proteins by using their sequence information. Combining the phylogenetic
profiles of two proteins by checking existence of homologs
in different species and fitting this combined profile into a statistical
model, it is possible to make predictions about the interaction status
of two proteins.
For this purpose, we apply a collection of pattern recognition
techniques on the dataset of combined phylogenetic profiles of protein
pairs. Support Vector Machines, Feature Extraction using ReliefF,
Naive Bayes Classification, K-Nearest Neighborhood Classification,
Decision Trees, and Random Forest Classification are the methods
we applied for finding the classification method that best predicts
the interaction status of protein pairs. Random Forest Classification
outperformed all other methods with a prediction accuracy of 76.93%
Abstract: The study of proteomics reached unexpected levels of
interest, as a direct consequence of its discovered influence over some
complex biological phenomena, such as problematic diseases like
cancer. This paper presents the latest authors- achievements regarding
the analysis of the networks of proteins (interactome networks), by
computing more efficiently the betweenness centrality measure. The
paper introduces the concept of betweenness centrality, and then
describes how betweenness computation can help the interactome net-
work analysis. Current sequential implementations for the between-
ness computation do not perform satisfactory in terms of execution
times. The paper-s main contribution is centered towards introducing
a speedup technique for the betweenness computation, based on
modified shortest path algorithms for sparse graphs. Three optimized
generic algorithms for betweenness computation are described and
implemented, and their performance tested against real biological
data, which is part of the IntAct dataset.
Abstract: Yeast cells live in a constantly changing environment that requires the continuous adaptation of their genomic program in order to sustain their homeostasis, survive and proliferate. Due to the advancement of high throughput technologies, there is currently a large amount of data such as gene expression, gene deletion and protein-protein interactions for S. Cerevisiae under various environmental conditions. Mining these datasets requires efficient computational methods capable of integrating different types of data, identifying inter-relations between different components and inferring functional groups or 'modules' that shape intracellular processes. This study uses computational methods to delineate some of the mechanisms used by yeast cells to respond to environmental changes. The GRAM algorithm is first used to integrate gene expression data and ChIP-chip data in order to find modules of coexpressed and co-regulated genes as well as the transcription factors (TFs) that regulate these modules. Since transcription factors are themselves transcriptionally regulated, a three-layer regulatory cascade consisting of the TF-regulators, the TFs and the regulated modules is subsequently considered. This three-layer cascade is then modeled quantitatively using artificial neural networks (ANNs) where the input layer corresponds to the expression of the up-stream transcription factors (TF-regulators) and the output layer corresponds to the expression of genes within each module. This work shows that (a) the expression of at least 33 genes over time and for different stress conditions is well predicted by the expression of the top layer transcription factors, including cases in which the effect of up-stream regulators is shifted in time and (b) identifies at least 6 novel regulatory interactions that were not previously associated with stress-induced changes in gene expression. These findings suggest that the combination of gene expression and protein-DNA interaction data with artificial neural networks can successfully model biological pathways and capture quantitative dependencies between distant regulators and downstream genes.
Abstract: Understanding the cell's large-scale organization is an interesting task in computational biology. Thus, protein-protein interactions can reveal important organization and function of the cell. Here, we investigated the correspondence between protein interactions and function for the yeast. We obtained the correlations among the set of proteins. Then these correlations are clustered using both the hierarchical and biclustering methods. The detailed analyses of proteins in each cluster were carried out by making use of their functional annotations. As a result, we found that some functional classes appear together in almost all biclusters. On the other hand, in hierarchical clustering, the dominancy of one functional class is observed. In the light of the clustering data, we have verified some interactions which were not identified as core interactions in DIP and also, we have characterized some functionally unknown proteins according to the interaction data and functional correlation. In brief, from interaction data to function, some correlated results are noticed about the relationship between interaction and function which might give clues about the organization of the proteins, also to predict new interactions and to characterize functions of unknown proteins.
Abstract: Detecting protein-protein interactions is a central problem in computational biology and aberrant such interactions may have implicated in a number of neurological disorders. As a result, the prediction of protein-protein interactions has recently received considerable attention from biologist around the globe. Computational tools that are capable of effectively identifying protein-protein interactions are much needed. In this paper, we propose a method to detect protein-protein interaction based on substring similarity measure. Two protein sequences may interact by the mean of the similarities of the substrings they contain. When applied on the currently available protein-protein interaction data for the yeast Saccharomyces cerevisiae, the proposed method delivered reasonable improvement over the existing ones.
Abstract: Transcription factors are a group of proteins that
helps for interpreting the genetic information in DNA.
Protein-protein interactions play a major role in the execution
of key biological functions of a cell. These interactions are
represented in the form of a graph with nodes and edges.
Studies have showed that some nodes have high degree of
connectivity and such nodes, known as hub nodes, are the
inevitable parts of the network. In the present paper a method
is proposed to identify hub transcription factor proteins using
sequence information. On a complete data set of transcription
factor proteins available from the APID database, the
proposed method showed an accuracy of 77%, sensitivity of
79% and specificity of 76%.
Abstract: Recent years have seen a growing trend towards the
integration of multiple information sources to support large-scale
prediction of protein-protein interaction (PPI) networks in model
organisms. Despite advances in computational approaches, the
combination of multiple “omic" datasets representing the same type
of data, e.g. different gene expression datasets, has not been
rigorously studied. Furthermore, there is a need to further investigate
the inference capability of powerful approaches, such as fullyconnected
Bayesian networks, in the context of the prediction of PPI
networks. This paper addresses these limitations by proposing a
Bayesian approach to integrate multiple datasets, some of which
encode the same type of “omic" data to support the identification of
PPI networks. The case study reported involved the combination of
three gene expression datasets relevant to human heart failure (HF).
In comparison with two traditional methods, Naive Bayesian and
maximum likelihood ratio approaches, the proposed technique can
accurately identify known PPI and can be applied to infer potentially
novel interactions.
Abstract: We have developed an energy based approach for identifying the binding sites and important residues for binding in protein-protein complexes. We found that the residues and residuepairs with charged and aromatic side chains are important for binding. These residues influence to form cation-¤Ç, electrostatic and aromatic interactions. Our observation has been verified with the experimental binding specificity of protein-protein complexes and found good agreement with experiments. The analysis on surrounding hydrophobicity reveals that the binding residues are less hydrophobic than non-binding sites, which suggests that the hydrophobic core are important for folding and stability whereas the surface seeking residues play a critical role in binding. Further, the propensity of residues in the binding sites of receptors and ligands, number of medium and long-range contacts, and influence of neighboring residues will be discussed.
Abstract: Due to the ever growing amount of publications about
protein-protein interactions, information extraction from text is
increasingly recognized as one of crucial technologies in
bioinformatics. This paper presents a Protein Interaction Extraction
System using a Link Grammar Parser from biomedical abstracts
(PIELG). PIELG uses linkage given by the Link Grammar Parser to
start a case based analysis of contents of various syntactic roles as
well as their linguistically significant and meaningful combinations.
The system uses phrasal-prepositional verbs patterns to overcome
preposition combinations problems. The recall and precision are
74.4% and 62.65%, respectively. Experimental evaluations with two
other state-of-the-art extraction systems indicate that PIELG system
achieves better performance. For further evaluation, the system is
augmented with a graphical package (Cytoscape) for extracting
protein interaction information from sequence databases. The result
shows that the performance is remarkably promising.
Abstract: The protein domain structure has been widely used as the most informative sequence feature to computationally predict protein-protein interactions. However, in a recent study, a research group has reported a very high accuracy of 94% using hydrophobicity feature. Therefore, in this study we compare and verify the usefulness of protein domain structure and hydrophobicity properties as the sequence features. Using the Support Vector Machines (SVM) as the learning system, our results indicate that both features achieved accuracy of nearly 80%. Furthermore, domains structure had receiver operating characteristic (ROC) score of 0.8480 with running time of 34 seconds, while hydrophobicity had ROC score of 0.8159 with running time of 20,571 seconds (5.7 hours). These results indicate that protein-protein interaction can be predicted from domain structure with reliable accuracy and acceptable running time.
Abstract: A New features are extracted and compared to
improve the prediction of protein-protein interactions. The basic idea
is to select and use the best set of features from the Tensor matrices
that are produced by the frequency vectors of the protein sequences.
Three set of features are compared, the first set is based on the
indices that are the most common in the interacting proteins, the
second set is based on the indices that tend to be common in the
interacting and non-interacting proteins, and the third set is
constructed by using random indices. Moreover, three encoding
strategies are compared; that are based on the amino asides polarity,
structure, and chemical properties. The experimental results indicate
that the highest accuracy can be obtained by using random indices
with chemical properties encoding strategy and support vector
machine.
Abstract: Understanding proteins functions is a major goal in
the post-genomic era. Proteins usually work in context of other
proteins and rarely function alone. Therefore, it is highly relevant to
study the interaction partners of a protein in order to understand its
function. Machine learning techniques have been widely applied to
predict protein-protein interactions. Kernel functions play an
important role for a successful machine learning technique. Choosing
the appropriate kernel function can lead to a better accuracy in a
binary classifier such as the support vector machines. In this paper,
we describe a Bayesian kernel for the support vector machine to
predict protein-protein interactions. The use of Bayesian kernel can
improve the classifier performance by incorporating the probability
characteristic of the available experimental protein-protein
interactions data that were compiled from different sources. In
addition, the probabilistic output from the Bayesian kernel can assist
biologists to conduct more research on the highly predicted
interactions. The results show that the accuracy of the classifier has
been improved using the Bayesian kernel compared to the standard
SVM kernels. These results imply that protein-protein interaction can
be predicted using Bayesian kernel with better accuracy compared to
the standard SVM kernels.