Eukaryotic Gene Prediction by an Investigation of Nonlinear Dynamical Modeling Techniques on EIIP Coded Sequences

Many digital signal processing, techniques have been used to automatically distinguish protein coding regions (exons) from non-coding regions (introns) in DNA sequences. In this work, we have characterized these sequences according to their nonlinear dynamical features such as moment invariants, correlation dimension, and largest Lyapunov exponent estimates. We have applied our model to a number of real sequences encoded into a time series using EIIP sequence indicators. In order to discriminate between coding and non coding DNA regions, the phase space trajectory was first reconstructed for coding and non-coding regions. Nonlinear dynamical features are extracted from those regions and used to investigate a difference between them. Our results indicate that the nonlinear dynamical characteristics have yielded significant differences between coding (CR) and non-coding regions (NCR) in DNA sequences. Finally, the classifier is tested on real genes where coding and non-coding regions are well known.

PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.