Eukaryotic Gene Prediction by an Investigation of Nonlinear Dynamical Modeling Techniques on EIIP Coded Sequences

Many digital signal processing, techniques have been used to automatically distinguish protein coding regions (exons) from non-coding regions (introns) in DNA sequences. In this work, we have characterized these sequences according to their nonlinear dynamical features such as moment invariants, correlation dimension, and largest Lyapunov exponent estimates. We have applied our model to a number of real sequences encoded into a time series using EIIP sequence indicators. In order to discriminate between coding and non coding DNA regions, the phase space trajectory was first reconstructed for coding and non-coding regions. Nonlinear dynamical features are extracted from those regions and used to investigate a difference between them. Our results indicate that the nonlinear dynamical characteristics have yielded significant differences between coding (CR) and non-coding regions (NCR) in DNA sequences. Finally, the classifier is tested on real genes where coding and non-coding regions are well known.





References:
[1] M. Akhtar, "Comparison of Gene and Exon Prediction Techniques for
Detection of Short Coding Regions," International Journal of
Information Technology, Vol. 11, No.8, 2005.
[2] A. Krogh, I. Saira Mian, and D. Haussler, "A hidden Markov Model that
Finds Genes in E. Coli DNA,"Nucleic Acids Rsearch, Vol. 22 pp. 4768-
4778, 1994.
[3] P. P. Vaidyanathan, B.-J. Yoon, "Digital filters for gene prediction
applications," IEEE Asilomar Conference on Signals, and Computers,
Monterey, U.S.A., Nov. 2002.
[4] A. S. Nair, S. P. Sreenadhan, "A Coding Measure Scheme Employing
Electron-Ion Interaction Pseudopotential (EIIP),"Bioinformation, vol. 1,
no. 6, pp. 197- 202, 2006.
[5] http://www.physik3.gwdg.de/tstool/.
[6] A. G. Mamistvalov, "n-Dimensional Moment Invariants and Conceptual
Mathematical Theory of Recognition n-Dimensional Solids," IEEE
Trans. on Pattern Recogn. Mach. Intell. , Vol. 20, no. 8, pp. 819-831,
1998.
[7] M. I. Owis, A. H. Abou-Zied, A. M. Youssef, and Y. M. Kadah, "Study
of features based on nonlinear dynamical modeling in ECG arrhythmia
detection and classification," IEEE. Trans. Biomedical Engineering, vol.
79, pp. 733-736, July 2002.
[8] R. C. Gonzalez, and R. E. Woods, Digital Image Processing, 2nd ed.,
Pearson Education, New York, 2001.
[9] L. Cao, A. Mees, K. Judd, and G. Froyland, "Determining of the
minimum embedding dimensions of input-output time series data," Intl.
Journal. Bifurcation and chaos, vol. 8, pp. 1491-1504, 1997.
[10] W. S. Pritchard, D.W. Duke, "Measuring chaos in the brain: A tutorial
review of EEG dimension estimation," Brain Cogn., vol. 27, no. 3, pp.
353-397, 1995.
[11] M. Burset, R. Guigo, "Evaluation of Gene Structure Prediction
Prgrams,"Genomics,http://genome.Imim.es/datasets/genomics96. 1996.
[12] S. Rogic, "Evaluation of Gene- Finding Programs," University of British
Columbia, http:/ /www.cs.ubc.ca/~rogic/evaluation.