Echo State Networks for Arabic Phoneme Recognition

This paper presents an ESN-based Arabic phoneme
recognition system trained with supervised, forced and combined
supervised/forced supervised learning algorithms. Mel-Frequency
Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC)
techniques are used and compared as the input feature extraction
technique. The system is evaluated using 6 speakers from the King
Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia
dialectic and 34 speakers from the Center for Spoken Language
Understanding (CSLU2002) database of speakers with different
dialectics from 12 Arabic countries. Results for the KAPD and
CSLU2002 Arabic databases show phoneme recognition
performances of 72.31% and 38.20% respectively.





References:
<p>[1] T. J. Reynolds, C. A. Antoniou, &ldquo;Experiments in speech recognition
using a modular MLP architecture for acoustic modeling, &rdquo;Information
Sciences, vol.156, Mar. 2003, pp. 39-54.
[2] W. Chen. S. Chen, C.Lin, &ldquo;A speech recognition method based on the
sequential multi-layer perceptrons, &rdquo;Neural Networks, vol. 9, Nov. 1996,
pp. 655-669.
[3] N. Hmad, T. Allen, &ldquo;Biologically inspired Continuous Arabic Speech
Recognition,&rdquo;.In Research and Development in intelligent systems XXIX,
32nd ed. Bramer, Petridis Ed. Cambridge, UK: Springer,2012, pp. 245-
258.
[4] T. Koizumi, M. Mori, S. Taniguchi, M. Maruya, &ldquo;Recurrent Neural
Networks for Phoneme Recognition,&rdquo;
[5] M. D. Skowronski, J. G. Harris, &ldquo;Automatic speech recognition using a
predictive echo state network classifier,&rdquo; Science direct, Neural
Networks, vol. 20, 2007,pp. 414-423.
[6] M. D. Skowronski, J. G. Harris, &ldquo;Minimum mean squared error time
series classification using an echo state network prediction model,&rdquo;
IEEE International Symposium on Circits Systems, Island of Kos,
Greece, 2006, pp. 3153-3156.
[7] M. C. Ozturk, J. C. Principe, &ldquo;An associative memory readout for ESNs
with applications to dynamical pattern recognition, &rdquo;Science direct,
Neural Networks, vol. 20, 2007. pp. 377&ndash;390.
[8] G. Holzmann, Echo State Networks with Filter Neurons and a
Delay&amp;Sum Readout with Applications in Audio Signal Processing.,
Thesis, Graz University of Technology, Austria, June 2008.
[9] H. Jaeger, H. Haas, &ldquo;Harnessing nonlinearity: predicting chaotic systems
and saving energy in wireless telecommunication,&rdquo; Science, vol. 304,
2004, pp. 78-80.
[10] H., Jeager, Adaptive Nonlinear System Identification with Echo State
Networks, 2003.
[11] D. Verstraeten, B. Schrauwen, M. D&rsquo;Haene, D. Stroobandt, &ldquo;An
experimental unification of reservoir computing methods, &rdquo;Science
direct, Neural Networks, vol. 20, 2007. pp. 391&ndash;403.
[12] M. H. Tong, A. D. Bickett, E. M. Christiansen, G. W. Cottrell,
&ldquo;Learning grammatical structure with Echo State Networks,&rdquo; Science
direct, Neural Networks, vol. 20, 2007. pp. 424&ndash;432.
[13] V. Sakenas, Distortion Invariant Feature Extraction with Echo State
Networks, Jacobs University Bremen, Germany, Oct. 2010.
[14] B. Schrauwen, L. Busing, A Hierarchy of Recurrent Networks for
Speech Recognition, 2010.
[15] H. Jaeger, M. Lukosevicius, D. Popovici, U. Siewert, &ldquo;Optimization and
Applications of Echo State Networks with Leaky Integrator Neurons,&rdquo;
Science direct, Neural Networks, vol. 20, 2007. pp. 335&ndash;352.
[16] T. P. Schmidt, M. A. Wiering, A. C. van Rossum, R. A.J. van Elburg, T.
C. Andringa, B. Valkenier, Robust Real-Time Vowel Classification with
an Echo State Network.,2010.
[17] H.J aeger, A tutorial on training recurrent neural networks, covering
BPPT, RTRL, EKF and the &quot;echo state network&quot; approach, International
University Bremen, 2005.
[18] I. Sutskever, Training Recurrent Neural Networks, University of
Toronto, 2013.</p>