Protein Secondary Structure Prediction

Protein structure determination and prediction has been a focal research subject in the field of bioinformatics due to the importance of protein structure in understanding the biological and chemical activities of organisms. The experimental methods used by biotechnologists to determine the structures of proteins demand sophisticated equipment and time. A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results. However, prediction accuracies of these methods rarely exceed 70%.

Protein Graph Partitioning by Mutually Maximization of cycle-distributions

The classification of the protein structure is commonly not performed for the whole protein but for structural domains, i.e., compact functional units preserved during evolution. Hence, a first step to a protein structure classification is the separation of the protein into its domains. We approach the problem of protein domain identification by proposing a novel graph theoretical algorithm. We represent the protein structure as an undirected, unweighted and unlabeled graph which nodes correspond the secondary structure elements of the protein. This graph is call the protein graph. The domains are then identified as partitions of the graph corresponding to vertices sets obtained by the maximization of an objective function, which mutually maximizes the cycle distributions found in the partitions of the graph. Our algorithm does not utilize any other kind of information besides the cycle-distribution to find the partitions. If a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. As stop criterion, we calculate numerically a significance level which indicates the stability of the predicted partition against a random rewiring of the protein graph. Hence, our algorithm terminates automatically its iterative application. We present results for one and two domain proteins and compare our results with the manually assigned domains by the SCOP database and differences are discussed.

The Knowledge Representation of the Genetic Regulatory Networks Based on Ontology

The understanding of the system level of biological behavior and phenomenon variously needs some elements such as gene sequence, protein structure, gene functions and metabolic pathways. Challenging problems are representing, learning and reasoning about these biochemical reactions, gene and protein structure, genotype and relation between the phenotype, and expression system on those interactions. The goal of our work is to understand the behaviors of the interactions networks and to model their evolution in time and in space. We propose in this study an ontological meta-model for the knowledge representation of the genetic regulatory networks. Ontology in artificial intelligence means the fundamental categories and relations that provide a framework for knowledge models. Domain ontology's are now commonly used to enable heterogeneous information resources, such as knowledge-based systems, to communicate with each other. The interest of our model is to represent the spatial, temporal and spatio-temporal knowledge. We validated our propositions in the genetic regulatory network of the Aarbidosis thaliana flower

Protein Residue Contact Prediction using Support Vector Machine

Protein residue contact map is a compact representation of secondary structure of protein. Due to the information hold in the contact map, attentions from researchers in related field were drawn and plenty of works have been done throughout the past decade. Artificial intelligence approaches have been widely adapted in related works such as neural networks, genetic programming, and Hidden Markov model as well as support vector machine. However, the performance of the prediction was not generalized which probably depends on the data used to train and generate the prediction model. This situation shown the importance of the features or information used in affecting the prediction performance. In this research, support vector machine was used to predict protein residue contact map on different combination of features in order to show and analyze the effectiveness of the features.

Using Spectral Vectors and M-Tree for Graph Clustering and Searching in Graph Databases of Protein Structures

In this paper, we represent protein structure by using graph. A protein structure database will become a graph database. Each graph is represented by a spectral vector. We use Jacobi rotation algorithm to calculate the eigenvalues of the normalized Laplacian representation of adjacency matrix of graph. To measure the similarity between two graphs, we calculate the Euclidean distance between two graph spectral vectors. To cluster the graphs, we use M-tree with the Euclidean distance to cluster spectral vectors. Besides, M-tree can be used for graph searching in graph database. Our proposal method was tested with graph database of 100 graphs representing 100 protein structures downloaded from Protein Data Bank (PDB) and we compare the result with the SCOP hierarchical structure.

An Algebra for Protein Structure Data

This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein structure data. The approach involves the introduction of several protein structure specific algebraic operators to query the complex data stored in an object-oriented database system. The Protein Algebra provides an extensible set of high-level Genomic Data Types and Protein Data Types along with a comprehensive collection of appropriate genomic and protein functions. The paper also presents a query translator that converts high-level query specifications in algebra into low-level query specifications in Protein-QL, a query language designed to query protein structure data. The query transformation process uses a Protein Ontology that serves the purpose of a dictionary.

Sequence-based Prediction of Gamma-turn Types using a Physicochemical Property-based Decision Tree Method

The γ-turns play important roles in protein folding and molecular recognition. The prediction and analysis of γ-turn types are important for both protein structure predictions and better understanding the characteristics of different γ-turn types. This study proposed a physicochemical property-based decision tree (PPDT) method to interpretably predict γ-turn types. In addition to the good prediction performance of PPDT, three simple and human interpretable IF-THEN rules are extracted from the decision tree constructed by PPDT. The identified informative physicochemical properties and concise rules provide a simple way for discriminating and understanding γ-turn types.

A General Model for Amino Acid Interaction Networks

In this paper we introduce the notion of protein interaction network. This is a graph whose vertices are the protein-s amino acids and whose edges are the interactions between them. Using a graph theory approach, we identify a number of properties of these networks. We compare them to the general small-world network model and we analyze their hierarchical structure.

Detecting Community Structure in Amino Acid Interaction Networks

In this paper we introduce the notion of protein interaction network. This is a graph whose vertices are the protein-s amino acids and whose edges are the interactions between them. Using a graph theory approach, we observe that according to their structural roles, the nodes interact differently. By leading a community structure detection, we confirm this specific behavior and describe thecommunities composition to finally propose a new approach to fold a protein interaction network.

Introducing Sequence-Order Constraint into Prediction of Protein Binding Sites with Automatically Extracted Templates

Search for a tertiary substructure that geometrically matches the 3D pattern of the binding site of a well-studied protein provides a solution to predict protein functions. In our previous work, a web server has been built to predict protein-ligand binding sites based on automatically extracted templates. However, a drawback of such templates is that the web server was prone to resulting in many false positive matches. In this study, we present a sequence-order constraint to reduce the false positive matches of using automatically extracted templates to predict protein-ligand binding sites. The binding site predictor comprises i) an automatically constructed template library and ii) a local structure alignment algorithm for querying the library. The sequence-order constraint is employed to identify the inconsistency between the local regions of the query protein and the templates. Experimental results reveal that the sequence-order constraint can largely reduce the false positive matches and is effective for template-based binding site prediction.

Sorting Primitives and Genome Rearrangementin Bioinformatics: A Unified Perspective

Bioinformatics and computational biology involve the use of techniques including applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Research in computational biology often overlaps with systems biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and proteinprotein interactions, and the modeling of evolution. Various global rearrangements of permutations, such as reversals and transpositions,have recently become of interest because of their applications in computational molecular biology. A reversal is an operation that reverses the order of a substring of a permutation. A transposition is an operation that swaps two adjacent substrings of a permutation. The problem of determining the smallest number of reversals required to transform a given permutation into the identity permutation is called sorting by reversals. Similar problems can be defined for transpositions and other global rearrangements. In this work we perform a study about some genome rearrangement primitives. We show how a genome is modelled by a permutation, introduce some of the existing primitives and the lower and upper bounds on them. We then provide a comparison of the introduced primitives.

Bioinformatics Profiling of Missense Mutations

The ability to distinguish missense nucleotide substitutions that contribute to harmful effect from those that do not is a difficult problem usually accomplished through functional in vivo analyses. In this study, instead current biochemical methods, the effects of missense mutations upon protein structure and function were assayed by means of computational methods and information from the databases. For this order, the effects of new missense mutations in exon 5 of PTEN gene upon protein structure and function were examined. The gene coding for PTEN was identified and localized on chromosome region 10q23.3 as the tumor suppressor gene. The utilization of these methods were shown that c.319G>A and c.341T>G missense mutations that were recognized in patients with breast cancer and Cowden disease, could be pathogenic. This method could be use for analysis of missense mutation in others genes.

Effect of Calcium Chloride on Rheological Properties and Structure of Inulin - Whey Protein Gels

The rheological properties, structure and potential synergistic interactions of whey proteins (1-6%) and inulin (20%) in mixed gels in the presence of CaCl2 was the aim of this study. Whey proteins have a strong influence on inulin gel formation. At low concentrations (2%) whey proteins did not impair in inulin gel formation. At higher concentration (4%) whey proteins impaired inulin gelation and inulin impaired the formation of a Ca2+-induced whey protein network. The presence of whey proteins at a level allowing for protein gel network formation (6%) significantly increased the rheological parameters values of the gels. SEM micrographs showed that whey protein structure was coated by inulin moieties which could make the mixed gels firmer. The protein surface hydrophobicity measurements did not exclude synergistic interactions between inulin and whey proteins, however. The use of an electrophoretic technique did not show any stable inulin-whey protein complexes.