Exons and Introns Classification in Human and Other Organisms
In the paper, the relative performances on spectral
classification of short exon and intron sequences of the human and
eleven model organisms is studied. In the simulations, all
combinations of sixteen one-sequence numerical representations, four
threshold values, and four window lengths are considered. Sequences
of 150-base length are chosen and for each organism, a total of
16,000 sequences are used for training and testing. Results indicate
that an appropriate combination of one-sequence numerical
representation, threshold value, and window length is essential for
arriving at top spectral classification results. For fixed-length
sequences, the precisions on exon and intron classification obtained
for different organisms are not the same because of their genomic
differences. In general, precision increases as sequence length
increases.
1] H. K. Kwan, B. Y. M. Kwan, and J. Y. Y. Kwan, "Novel
methodologies for spectral classification of exon and intron
sequences," EURASIP Journal on Advances in Signal
Processing, vol. 2011, 2011 (in press).
[2] R. A. Dalloul, J. A. Long, A. V. Zimin, et al. "Multi-platform
next-generation sequencing of the domestic turkey (Meleagris
gallopavo): Genome assembly and analysis", PLoS Biology, vol.
8, pii: e1000475, 2010.
[3] P. D. Cristea, "Genetic signal representation and analysis," in
Proceedings of Society of Photo-Optical Instrumentation
Engineers (SPIE) Conference, vol. 4623, January 2002, pp. 77-
84.
[4] M. Akhtar, J. Epps, and E. Ambikairajah, "Signal processing in
sequence analysis: Advances in eukaryotic gene prediction,"
IEEE Journal of Selected Topics in Signal Processing, vol. 2,
pp. 310-321, June 2008.
[5] T. Holden, R. Subramaniam, R. Sullivan, E. Cheng, C. Sneider,
G. Tremberger, Jr. A. Flamholz, D. H. Leiberman, and T. D.
Cheung, "ATCG nucleotide fluctuation of Deinococcus
radiodurans radiation genes," in Proceedings of Society of
Photo-Optical Instrumentation Engineers (SPIE), vol. 6694,
August 2007, pp. 669417-1 to 669417-10.
[6] H. E. Stanley, S. V. Buldyrev, A. L. Goldberger, Z. D.
Goldberger, S, Havlin, S. M. Ossadnik, C.-K. Peng, and M.
Simmons, "Statistical mechanics in biology: How ubiquitous are
long-range correlations?" Physica A, vol. 205, pp. 214-253,
April 1994.
[7] A. S. Nair and S. S. Pillai, "A coding measure scheme employing
electron-ion interaction pseudo potential (EIIP),"
Bioinformation, vol. 1, pp. 197-202, October 2006.
[8] N. Chakravarthy, A. Spanias, L. D. Lasemidis, and K. Tsakalis,
"Autoregressive modeling and feature analysis of DNA
sequences," EURASIP Journal of Genomic Signal Processing,
vol. 1, pp. 13-28, January 2004.
[9] P. D. Cristea, "Conversion of nucleotides sequences into
genomic signals," Journal of Cellular and Molecular Medicine,
vol. 6, pp. 279-303, April-June 2002.
[10] S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya,
and R. Ramaswamy, "Prediction of probable genes by Fourier
analysis of genomic sequences," Bioinformatics (CABIOS), vol.
13, issue 3, pp. 263-270, 1997.
[11] D. Karolchik, A. S. Hinrichs, T. S. Furey, K. M. Roskin, C. W.
Sugnet, D. Haussler, and W. J. Kent, "The UCSC Table Browser
data retrieval tool," Nucleic Acids Research, vol. 32 (Database
issue), pp. D493-496, 1 January 2004.
[12] J. Goecks, A. Nekrutenko, J. Taylor, and The Galaxy Team,
"Galaxy: A comprehensive approach for supporting accessible,
reproducible, and transparent computational research in the life
sciences," Genome Biology, vol. 11, issue 8, article R86, 25
August 2010.
[13] D. Blankenberg, G. Von Kuster, N. Coraor, G. Ananda, R.
Lazarus, M. Mangan, A. Nekrutenko, and J. Taylor, "Galaxy: A
web-based genome analysis tool for experimentalists," Current
Protocols in Molecular Biology, chapter 19, unit 19.10.1-21,
January 2010.
[14] B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski,
P. Shah, Y. Zhang, D. Blankenberg, I. Albert, J. Taylor, W.
Miller, W. J. Kent, and A. Nekrutenko, "Galaxy: A platform for
interactive large-scale genome analysis," Genome Research, vol.
15, issue 10, pp. 1451-1455, 15 October 2005.
[15] J. E. Allen and S. L. Salzberg, "JIGSAW: Integration of
multiple sources of evidence for gene prediction,"
Bioinformatics, vol. 21, no. 18, pp. 3596-603, 2005.
[16] H. Jiang and W. H. Wong, "SeqMap: Mapping massive amount
of oligonucleotides to the genome," Bioinformatics, vol. 24, no.
20, pp. 2395-2396, 2008.
1] H. K. Kwan, B. Y. M. Kwan, and J. Y. Y. Kwan, "Novel
methodologies for spectral classification of exon and intron
sequences," EURASIP Journal on Advances in Signal
Processing, vol. 2011, 2011 (in press).
[2] R. A. Dalloul, J. A. Long, A. V. Zimin, et al. "Multi-platform
next-generation sequencing of the domestic turkey (Meleagris
gallopavo): Genome assembly and analysis", PLoS Biology, vol.
8, pii: e1000475, 2010.
[3] P. D. Cristea, "Genetic signal representation and analysis," in
Proceedings of Society of Photo-Optical Instrumentation
Engineers (SPIE) Conference, vol. 4623, January 2002, pp. 77-
84.
[4] M. Akhtar, J. Epps, and E. Ambikairajah, "Signal processing in
sequence analysis: Advances in eukaryotic gene prediction,"
IEEE Journal of Selected Topics in Signal Processing, vol. 2,
pp. 310-321, June 2008.
[5] T. Holden, R. Subramaniam, R. Sullivan, E. Cheng, C. Sneider,
G. Tremberger, Jr. A. Flamholz, D. H. Leiberman, and T. D.
Cheung, "ATCG nucleotide fluctuation of Deinococcus
radiodurans radiation genes," in Proceedings of Society of
Photo-Optical Instrumentation Engineers (SPIE), vol. 6694,
August 2007, pp. 669417-1 to 669417-10.
[6] H. E. Stanley, S. V. Buldyrev, A. L. Goldberger, Z. D.
Goldberger, S, Havlin, S. M. Ossadnik, C.-K. Peng, and M.
Simmons, "Statistical mechanics in biology: How ubiquitous are
long-range correlations?" Physica A, vol. 205, pp. 214-253,
April 1994.
[7] A. S. Nair and S. S. Pillai, "A coding measure scheme employing
electron-ion interaction pseudo potential (EIIP),"
Bioinformation, vol. 1, pp. 197-202, October 2006.
[8] N. Chakravarthy, A. Spanias, L. D. Lasemidis, and K. Tsakalis,
"Autoregressive modeling and feature analysis of DNA
sequences," EURASIP Journal of Genomic Signal Processing,
vol. 1, pp. 13-28, January 2004.
[9] P. D. Cristea, "Conversion of nucleotides sequences into
genomic signals," Journal of Cellular and Molecular Medicine,
vol. 6, pp. 279-303, April-June 2002.
[10] S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya,
and R. Ramaswamy, "Prediction of probable genes by Fourier
analysis of genomic sequences," Bioinformatics (CABIOS), vol.
13, issue 3, pp. 263-270, 1997.
[11] D. Karolchik, A. S. Hinrichs, T. S. Furey, K. M. Roskin, C. W.
Sugnet, D. Haussler, and W. J. Kent, "The UCSC Table Browser
data retrieval tool," Nucleic Acids Research, vol. 32 (Database
issue), pp. D493-496, 1 January 2004.
[12] J. Goecks, A. Nekrutenko, J. Taylor, and The Galaxy Team,
"Galaxy: A comprehensive approach for supporting accessible,
reproducible, and transparent computational research in the life
sciences," Genome Biology, vol. 11, issue 8, article R86, 25
August 2010.
[13] D. Blankenberg, G. Von Kuster, N. Coraor, G. Ananda, R.
Lazarus, M. Mangan, A. Nekrutenko, and J. Taylor, "Galaxy: A
web-based genome analysis tool for experimentalists," Current
Protocols in Molecular Biology, chapter 19, unit 19.10.1-21,
January 2010.
[14] B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski,
P. Shah, Y. Zhang, D. Blankenberg, I. Albert, J. Taylor, W.
Miller, W. J. Kent, and A. Nekrutenko, "Galaxy: A platform for
interactive large-scale genome analysis," Genome Research, vol.
15, issue 10, pp. 1451-1455, 15 October 2005.
[15] J. E. Allen and S. L. Salzberg, "JIGSAW: Integration of
multiple sources of evidence for gene prediction,"
Bioinformatics, vol. 21, no. 18, pp. 3596-603, 2005.
[16] H. Jiang and W. H. Wong, "SeqMap: Mapping massive amount
of oligonucleotides to the genome," Bioinformatics, vol. 24, no.
20, pp. 2395-2396, 2008.
@article{"International Journal of Biological, Life and Agricultural Sciences:56306", author = "Benjamin Y. M. Kwan and Jennifer Y. Y. Kwan and Hon Keung Kwan", title = "Exons and Introns Classification in Human and Other Organisms", abstract = "In the paper, the relative performances on spectral
classification of short exon and intron sequences of the human and
eleven model organisms is studied. In the simulations, all
combinations of sixteen one-sequence numerical representations, four
threshold values, and four window lengths are considered. Sequences
of 150-base length are chosen and for each organism, a total of
16,000 sequences are used for training and testing. Results indicate
that an appropriate combination of one-sequence numerical
representation, threshold value, and window length is essential for
arriving at top spectral classification results. For fixed-length
sequences, the precisions on exon and intron classification obtained
for different organisms are not the same because of their genomic
differences. In general, precision increases as sequence length
increases.", keywords = "Exons and introns classification, Human genome,
Model organism genome, Spectral analysis", volume = "5", number = "12", pages = "913-4", }