A New Predictor of Coding Regions in Genomic Sequences using a Combination of Different Approaches

Identifying protein coding regions in DNA sequences is a basic step in the location of genes. Several approaches based on signal processing tools have been applied to solve this problem, trying to achieve more accurate predictions. This paper presents a new predictor that improves the efficacy of three techniques that use the Fourier Transform to predict coding regions, and that could be computed using an algorithm that reduces the computation load. Some ideas about the combination of the predictor with other methods are discussed. ROC curves are used to demonstrate the efficacy of the proposed predictor, based on the computation of 25 DNA sequences from three different organisms.





References:
[1] S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R.
Ramaswamy, "Prediction of probable genes by Fourier analysis of
genomic sequences," CABIOS, vol. 113, pp. 263-270, 1997.
[2] D. Anastassiou, "Genomic signal processing," IEEE Signal Processing
Magazine, vol. 18, pp. 8-20, 2001.
[3] D. Kotlar and Y. Lavner, "Gene Prediction by Spectral Rotation
Measure: A New Method for Identifying Protein-Coding Regions,"
Genome Research, vol. 13, pp. 1930-1937, 2003.
[4] P. P. Vaidyanathan and B.-J. Yoon, "Gene and exon prediction using
allpass-based filters," ONR, 2002.
[5] M. Akhtar, E. Ambikairajah, and J. Epps, "Detection of Period-3
Behavior in Genomic Sequences Using Singular Value Decomposition,"
IEEE-International Conference on Emerging Technologies, pp. 13-17,
2005.
[6] J. A. Berger, S. K. Mitra, and J. Astola, "Power spectrum analysis for
DNA sequences," Proceedings of the International Symposium on Signal
Processing and its Applications (ISSPA 2003), Paris, France, pp. 29-32,
2003.
[7] G. Dodin, P. vanderghenynst, P. Levoir, C. Cordier, and L. Marcourt,
"Fourier and Wavelet Transform Analysis, a Tool for Visualizing
Regular Patterns in DNA Sequences," J. Theor. Biol, vol. 206, pp. 323-
326, 2000.
[8] J. A. Berger, S. K. Mitra, M. Carli, and A. Neri, "New approaches to
genome sequence analysis based on digital signal processing,"
University of California, 2002.
[9] P. Bernaola-Galván, I. Grosse, P. Carpena, J. L. Oliver, R. Román-
Roldán, and H. E. Stanley, "Finding Borders between Coding and
Noncoding DNA Regions by an Entropic Segmentation Method,"
PHYSICAL REVIEW LETTERS, vol. 85, pp. 1342-1345, 2000.
[10] D. Nicorici and J. Astola, "Segmentation of DNA into Coding and
Noncoding Regions Based on Recursive Entropic Segmentation and
Stop-Codon Statistics," EURASIP Journal on Applied Signal
Processing, pp. 81-91, 2004.
[11] A. R. Fuentes, J. V. L. Ginori, and R. G. Ábalo, "Detection of Coding
Regions in Large DNA Sequences Using the Short Time Fourier
Transform with Reduced Computational Load," In: Martínez-Trinidad,
J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol.
4225, pp. 902-909, 2006.
[12] P. D. Cristea, "Conversion of nucleotides sequences into genomic
signals," J. Cell. Mol. Med., vol. 6, pp. 279-303, 2002.
[13] S.-C. Su, C. H. Yeh, and C. J. Kuo, "Structural Analysis of Genomic
Sequences with Matched Filtering," IEEE Signal Proccessing Magazine,
vol. 3, pp. 2893-2896, 2003.
[14] A. A. Tsonis, J. B. Elsner, and P. A. Tsonis, "Periodicity in DNA coding
sequences: Implications in gene evolution," J. Theor. Biol., vol. 151, pp.
323-331, 1991.
[15] V. R. Chechetkin and A. Y. Turygin, "Size-dependence of threeperiodicity
and long-range correlations in DNA sequences," Phys. Lett.
A, vol. 199, pp. 75-80, 1995.
[16] J. Gao, Y. Cao, Y. Qi, and J. Hu, "Building Innovative Representations
of DNA Sequences to Facilitate Gene Finding," IEEE INTELLIGENT
SYSTEMS, pp. 34-39, 2005.
[17] C. E. Shannon, "A Mathematical Theory of Communication," The Bell
System Technical Journal, vol. 27, pp. 379-423, 623-656, 1948.
[18] A. Rényi, "On measures of information and entropy," Proceedings of the
4th Berkeley Symposium on Mathematics, Statistics and Probability, pp.
547-561, 1960.
[19] J. A. Swets and R. M. Pickett, "Evaluation of diagnostic systems:
methods from signal detection theory.," Nueva York: Academic Press,
1982.
[20] M. H. Zweig and G. Campbell, "Receiver-operating characteristic
(ROC) plots: a fundamental evaluation tool in clinical medicine.," Clin
Chem, vol. 39, pp. 561-577, 1993.
[21] "GenBank database," NCBI.