Identifying New Sequence Features for Exon-Intron Discrimination by Rescaled-Range Frameshift Analysis

For identifying the discriminative sequence features between exons and introns, a new paradigm, rescaled-range frameshift analysis (RRFA), was proposed. By RRFA, two new sequence features, the frameshift sensitivity (FS) and the accumulative penta-mer complexity (APC), were discovered which were further integrated into a new feature of larger scale, the persistency in anti-mutation (PAM). The feature-validation experiments were performed on six model organisms to test the power of discrimination. All the experimental results highly support that FS, APC and PAM were all distinguishing features between exons and introns. These identified new sequence features provide new insights into the sequence composition of genes and they have great potentials of forming a new basis for recognizing the exonintron boundaries in gene sequences.




References:
[1] M Burset, I A Seledtsov, and V V Solovyev. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res,
28(21):4364-4375, 2000.
[2] M Burset, I A Seledtsov, and V V Solovyev. Splicedb: database of canonical and non-canonical mammalian splice sites. Nucleic Acids
Res, 29(1):255-259, 2001.
[3] V R Chechetkin and V V Lobzin. Study of correlations in segmented dna
sequences: application to structure coupling between exons and introns.
J Theor Biol, 190(1):69-83, 1998.
[4] Tzu-Ming Chern, Erik van Nimwegen, Chikatoshi Kai, Jun Kawai,Piero Carninci, Yoshihide Hayashizaki, and Mihaela Zavolan. A simple
physical model predicts small exon length variations. PLoS Genet,2(4):e45, 2006.
[5] J M Claverie and L Bougueleret. Heuristic informational analysis of
sequences. Nucleic Acids Res, 14(1):179-196, 1986.
[6] Alexei Fedorov, Serge Saxonov, and Walter Gilbert. Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res,
30(5):1192-1197, 2002.
[7] J W Fickett and C S Tung. Assessment of protein coding measures. Nucleic Acids Res, 20(24):6441-6450, 1992.
[8] C Frontali and E Pizzi. Similarity in oligonucleotide usage in introns
and intergenic regions contributes to long-range correlation in the
caenorhabditis elegans genome. Gene, 232(1):87-95, 1999.
[9] A Gabrielian and A Bolshoy. Sequence complexity and dna curvature.
Comput Chem, 23(3-4):263-274, 1999.
[10] Vivek Gopalan, Tin Wee Tan, Bernett T K Lee, and Shoba Ranganathan.
Xpro: database of eukaryotic protein-encoding genes. Nucleic Acids Res,
32(Database issue):D59-63, 2004.
[11] Matthew P Hare and Stephen R Palumbi. High intron sequence conservation
across three mammalian orders suggests functional constraints. Mol Biol Evol, 20(6):969-978, 2003.
[12] Jennifer L Kabat, Sergio Barberan-Soler, Paul McKenna, Hiram Clawson,
Tracy Farrer, and Alan M Zahler. Intronic alternative splicing regulators identified by comparative genomics in nematodes. PLoS
Comput Biol, 2(7):e86, 2006.
[13] M Kozak. Comparison of initiation of protein synthesis in procaryotes, eucaryotes, and organelles. Microbiol Rev, 47(1):1-45, 1983.
[14] S W Liou and Y F Huang. Investigating the intrinsic differences in flank regions of exon-intron junction sites. In BMEI (2), volume 2,
pages 96-101. IEEE Computer Society, 2008.
[15] Jacek Majewski and Jurg Ott. Distribution and characterization of
regulatory elements in the human genome. Genome Res, 12(12):1827-1836, 2002.
[16] A J McCullough and S M Berget. G triplets located throughout a class
of small vertebrate introns enforce intron borders and regulate splice site
selection. Mol Cell Biol, 17(8):4562-4571, 1997.
[17] G Mengeritsky and T F Smith. New analytical tool for analysis of splice site sequence determinants. Comput Appl Biosci, 5(2):97-100, 1989.
[18] K Nakata, M Kanehisa, and C DeLisi. Prediction of splice junctions in mrna sequences. Nucleic Acids Res, 13(14):5327-5340, 1985.
[19] Y L Orlov and V N Potapov. Complexity: an internet resource for
analysis of dna sequence complexity. Nucleic Acids Res, 32(Web Server
issue):W628-33, 2004.
[20] Joanna L Parmley and Laurence D Hurst. Exonic splicing regulatory
elements skew synonymous codon usage near intron-exon boundaries in
mammals. Mol Biol Evol, 24(8):1600-1603, 2007.
[21] Pasquale Pollastro and Salvatore Rampone. Hs3d: Homo sapiens splice
site data set. Nucleic Acids Research, Annual Database Issue, 2002.
[22] S Rampone. Recognition of splice junctions on dna sequences by brain
learning algorithm. Bioinformatics, 14(8):676-684, 1998.
[23] P A Sharp. Splicing of messenger rna precursors. Science, 235(4790):766-771, 1987.
[24] V V Solovyev, A A Salamov, and C B Lawrence. Predicting internal
exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res, 22(24):5156-5163,
1994.
[25] Rotem Sorek and Gil Ast. Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res,
13(7):1631-1637, 2003.
[26] H Sun and L A Chasin. Multiple splicing defects in an intronic false exon. Mol Cell Biol, 20(17):6414-6425, 2000.
[27] Rodger B Voelker and J Andrew Berglund. A comprehensive computational
characterization of conserved mammalian intronic sequences reveals conserved motifs associated with constitutive and alternative
splicing. Genome Res, 17(7):1023-1033, 2007.
[28] Erik Willie and Jacek Majewski. Evidence for codon bias selection at the pre-mrna level in eukaryotes. Trends Genet, 20(11):534-538, 2004.
[29] G K Wong, D A Passey, Y Huang, Z Yang, and J Yu. Is junk dna mostly intron dna? Genome Res, 10(11):1672-1678, 2000.