Neural Network Based Determination of Splice Junctions by ROC Analysis

Gene, principal unit of inheritance, is an ordered sequence of nucleotides. The genes of eukaryotic organisms include alternating segments of exons and introns. The region of Deoxyribonucleic acid (DNA) within a gene containing instructions for coding a protein is called exon. On the other hand, non-coding regions called introns are another part of DNA that regulates gene expression by removing from the messenger Ribonucleic acid (RNA) in a splicing process. This paper proposes to determine splice junctions that are exon-intron boundaries by analyzing DNA sequences. A splice junction can be either exon-intron (EI) or intron exon (IE). Because of the popularity and compatibility of the artificial neural network (ANN) in genetic fields; various ANN models are applied in this research. Multi-layer Perceptron (MLP), Radial Basis Function (RBF) and Generalized Regression Neural Networks (GRNN) are used to analyze and detect the splice junctions of gene sequences. 10-fold cross validation is used to demonstrate the accuracy of networks. The real performances of these networks are found by applying Receiver Operating Characteristic (ROC) analysis.




References:
[1] W.S. Klug, M.R. Cummings, Concepts of Genetics, Prentice Hall, 2000.
[2] S. Makal, L. Ozyilmaz, "Determination of splice junctions on DNA by
neural Networks," International Symposium on Innovations in
Intelligent Systems and Applications, Istanbul, 2007, pp. 234-237.
[3] T. Naenna, R.A. Embrechts, "A modified Kohonen network for DNA
splicejunction classification," IEEE Region 10 Conference, Chiang Mai,
2004, pp. 215-218.
[4] S. Mereuta, V. Munteanu, "A New Information Theoretic Approach to
Exon-Intron Classification," International Symposium on Signals,
Circuits and Systems, Lasi, 2007, Vol.2, pp. 1-4, 2007.
[5] M. Sarkar, T.Y. Leong, "Splice junction classification problems for
DNA sequences," 23rd Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, ─░stanbul, 2001, pp. 2895-
2898.
[6] L. Ozyilmaz, "Determination of exon and intron regions on DNA
sequences by artificial neural Networks," Advances in Molecular
Medicine International Journal of MolecularBiology, Biochemistry and
Gene Technology, Istanbul, 2005, pp. 452-453.
[7] S. Rampone, "Splice-junction recognition on gene sequences (DNA) by
BRAIN learning algorithm," IEEE World Congress on Computational
Intelligence Neural Networks Proceedings , Anchorage, 1998, Vol.1, pp.
774-779.
[8] L. Fu, "An Expert Network For DNA Sequence Analysis," IEEE
Intelligent Systems, Vol.14, Issue 1, pp. 65-71.
[9] Y. Xu, G. Helt, J.R. Einstein, G. Rubin, E.C. Uberbacher, "Drosophila
GRAIL: an intelligent system for gene recognition in Drosophila DNA
sequences," First International Symposium on Intelligence in Neural and
Biological Systems, Herndon, 1995, pp. 128-135.
[10] J.J. Li, D.S. Huang, R.M. MacCallum, X.R. Wu, "Characterizing human
gene splice sites using evolved regular expressions," IEEE International
Joint Conference on Neural Networks, Montreal, 2005, pp. 493-498.
[11] T. Naenna, R.A. Bress, M.J. Embrechts, "DNA classifications with selforganizing
maps (SOMs)," IEEE International Workshop on Soft
Computing in Industrial Applications, Binghamton, 2003, pp. 151-154.
[12] Available: http://www.ics.uci.edu/~mlearn/databases/moleculer-biology/
[13] C.E. Vasios, G.K. Matsapoulos, E.M. Ventauras, K.S. Nikita, N.
Uzunoglu, "Cross validation and neural network architecture selection
for the classification of intracranial current sources," 7th Seminar on
Neural Network Applications in Electrical, Serbia, 2004, pp. 151-158.
[14] T.J. Downey, D.J. Meyer, R.K. Price, E.L. Spitznagel, "Using the
receiver operating characteristic to asses the performance of neural
classifiers," International Joint Conference on Neural Networks,
Washington, 1999, pp. 3642-3646.
[15] S. Wang, C.I. Chang, S.C. Yang, G.C. Hsu, H.H. Hsu, P.C. Chung, "3D
ROC analysis for medical imaging diagnosis," IEEE Engineering in
Medicine and Biology, Shanghai, 2008, pp. 7545-7548.
[16] C.H. Wu, J.W. MacLarty, Neural Networks and Genome Informatics,
Elsevier Science Ltd., 2000.