Assamese Numeral Speech Recognition using Multiple Features and Cooperative LVQ -Architectures

A set of Artificial Neural Network (ANN) based methods for the design of an effective system of speech recognition of numerals of Assamese language captured under varied recording conditions and moods is presented here. The work is related to the formulation of several ANN models configured to use Linear Predictive Code (LPC), Principal Component Analysis (PCA) and other features to tackle mood and gender variations uttering numbers as part of an Automatic Speech Recognition (ASR) system in Assamese. The ANN models are designed using a combination of Self Organizing Map (SOM) and Multi Layer Perceptron (MLP) constituting a Learning Vector Quantization (LVQ) block trained in a cooperative environment to handle male and female speech samples of numerals of Assamese- a language spoken by a sizable population in the North-Eastern part of India. The work provides a comparative evaluation of several such combinations while subjected to handle speech samples with gender based differences captured by a microphone in four different conditions viz. noiseless, noise mixed, stressed and stress-free.




References:
[1] "National Institute on Deafness and Other Communication Disorders",
(www.nidcd.nih.gov/health/voice/whatisvsl.htm)
[2] A. Saxena and A. Singh: "A Microprocessor based Speech Recognizer
for Isolated Hindi Digits", Department of Electrical Engineering,
Indian Institute of Technology Kanpur, India (www.stanford.edu/ asaxena/
research/speechrecognizer.shtml)
[3] B. Gas: "Self-Organizing MultiLayer Perceptron", IEEE Transactions on
Neural Networks, Vol.: 1(99), pp: 1 - 1, 2010.
[4] L. Shuling, W. Chaoli, D. Jiaming: "Nonspecific Speech Recognition
Method Based on Composite LVQ1 and LVQ2 Network", IEEE Conference
on Control and Decision Conference, 2009 (CCDC -09), pp: 2304
- 2308, 2009.
[5] L. Shuling, W. Chaoli, D. Jiaming: "Nonspecific Speech Recognition
based on HMM / LVQ Hybrid Network", Second IEEE International
Conference on Intelligent Computation Technology and Automation, 2009
(ICICTA -09), Vol: 1, pp: 645 - 648, 2009.
[6] L. Qiong, L. Stephen, W. Ying and H. Thomas: "Robot Speech Learning
via Entropy Guided LVQ and Memory Association", Proceedings of IEEE
International Joint Conference on Neural Networks, 2001 (IJCNN -01),
Vol: 3, pp: 2176 - 2181, 2001.
[7] H. Jaakko, T. Volker and S. Olli: "A Learning Vector Quantization
Algorithm For Probabilistic Models", X European Signal Processing
Conference (EUSIPCO 2000), Vol. II, pp: 721-724, 2000.
[8] J. S. Baras, and S. Dey: "Combined Compression and Classification with
Learning Vector Quantization" IEEE Transactions on Information Theory,
Vol: 45 (6), pp: 1911 - 1920, 1999.
[9] N. B. Karayiannis: "An Axiomatic Approach to Soft Learning Vector
Quantization and Clustering", IEEE Transactions on Neural Networks,
Vol: 10 (5), pp: 1153 - 1165, 1999.
[10] H. K. Kwan: "Fuzzy Neural Network For Phoneme Sequence Recognition",
IEEE International Symposium on Circuits and Systems (ISCAS
2002), Volume: 2, pp: II- 847 -850, 2002.
[11] N. S. Lechn, J. I. Godino-Llorente, V. Osma-Ruiz, M. Blanco-Velasco
and F. Cruz-Roldn: "Automatic Assessment of Voice Quality According
to the GRBAS Scale", 28th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society (EMBS -06), pp: 2478 -
2481, 2006.
[12] J. I. Godino-Llorente, P. Gomez-Vilda: "Automatic detection of voice
impairments by means of short-term cepstral parameters and neural
network based detectors", IEEE Transactions on Biomedical Engineering,
Vol: 51 (2), pp: 380 - 384, 2004.
[13] tdil.mit.gov.in/assamesecodechartoct02.pdf (courtesy: Prof. Gautam
Baruah, Dept. of CSE, IIT Guwahati, Guwahati, Assam, India.)
[14] A. Dev, S. S. Agrawal and D. R. Choudhury: "Categorization of Hindi
phonemes by neural networks", Spinger Journal of AI and Society, vol.
17 (3-4), pp. 375-382, 2003.
[15] A. Sharma, M. C. Shrotriya, O. Farooq and Z. A. Abbasi: "Hybrid
wavelet based LPC features for Hindi speech recognition", International
Journal of Information and Communication Technology, vol. 1 (3-4), pp.
373-381 (9), 2009.
[16] D. K. Rajoriya, R. S. Anand and R. P. Maheshwari: "Hindi
paired word recognition using probabilistic neural network", International
Journal of Computational Intelligence Studies (IJCISTUDIES), Vol. 1,
No. 3, pp. 291 - 308, 2010.
[17] M. Sarma, K. Dutta and K. K. Sarma: "Assamese Numeral Corpus for
Speech Recognition using Cooperative ANN Architecture", International
Journal of International Journal of Electrical and Electronics Engineering,
vol.3:8, pp. 458 - 468, 2009.
[18] M. Sarma, K. Dutta and K. K. Sarma: "Speech Corpus of
Assamese Numerals Extracted using an Adaptive Pre-emphasis Filter
for Speech Recognition", Proceedings of IEEE International Conference
on Computer and Communication Technology (ICCCT-2010), Allahabad,
India, 2010.
[19] M. Sarma, K. Dutta and K. K. Sarma: "LPC-Cepstrum Corpus
of Assamese Numerals for Speech Recognition Using Recurrent Neural
Network", Proceedings of IEEE Communications Society Sponsored
Conference International Conference on Advances in Communication,
Network and Computing (CNC 2010), Calicut, India, 2010.
[20] M. P. Sarma and K. K. Sarma: "Speech Recognition of Assamese
Numerals using combinations of LPC - features and heterogenous ANNs",
Proceedings of International Conference on Advances in Information and
Communication Technologies (ICT 2010), Kochi, Kerala, India, 2010.
[21] A. P. Simpson, "Phonetic differences between male and female
speech", Language and Linguistics Compass 3/2, pp.: 621 640, 2009.
[22] B. Yegnanarayana, Artificial Neural Networks, 1st Ed., PHI, New
Delhi, 2003.
[23] Feature Extraction, cslu.cse.ogi.edu /toolkit /old /old /version 2.0a /.../
node5.html.
[24] B. Atal, "Efficient coding of LPC parameters by temporal decomposition",
Proceedings of IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP -83), Vol. 8, pp: 81 - 84, 1983.
[25] K. Y. Lee, A. M. Kondoz, and B. G. Evans: "Speaker adaptive vector
quantisation of LPC parameters of speech", Electronics Letters, Vol. 24
(22), pp: 1392 - 1393, 1988.
[26] M. P. Kesarkar, Feature Extraction for Speech Recogntion, M.Tech.
Credit Seminar Report, Electronic Systems Group, EE. Dept, IIT Bombay,
November, 2003.
[27] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech
Signals, 1st Ed., Prentice Hall, 1978.
[28] V. Tyagi, I. McCowan, H. Misra and H. Bourlard: " Mel-Cepstrum
Modulation Spectrum (MCMS) Feature for Robust ASR", Dalle Molle
Institute for Perceptual Artificial Intelligence (IDIAP), P.O. Box 592, CH-
1920, Martigny, Switzerland.
[29] S. Haykin, Neural Networks A Comprehensive Foundation, Pearson
Education, 2nd edition, 2003.
[30] S. Kumar, Neural Networks A Classroom Approach, Tata McGraw
Hill, 8th Reprint, 2009.
[31] E. Alhoneiemi, J. Hollmn, O. Simula and J. Vesanto: "Process
monitoring and modeling using the self-organizing map", Integrated
Computer Aided Engineering, Vol. 6 (1), pp. 3-14, 1999.
[32] S. Kaski and K. Lagus: "Comparing self-organizing maps" ,
Proceeding of International Conference on Neural Networks, pp. 809-
814, 1997.
[33] H. U. Bauer and K. Pawelzik: "Quantifying the neighborhood preservation
of self-organizing feature maps", IEEE Transactions on Neural
Networks, Vol. 3, no. 4, pp. 570-579, 1992.
[34] R. Rojas, Neural Networks-A Systematic Introduction, Springer, Berlin,
1996.