Application of Smooth Ergodic Hidden Markov Model in Text to Speech Systems

In developing a text-to-speech system, it is well known that the accuracy of information extracted from a text is crucial to produce high quality synthesized speech. In this paper, a new scheme for converting text into its equivalent phonetic spelling is introduced and developed. This method is applicable to many applications in text to speech converting systems and has many advantages over other methods. The proposed method can also complement the other methods with a purpose of improving their performance. The proposed method is a probabilistic model and is based on Smooth Ergodic Hidden Markov Model. This model can be considered as an extension to HMM. The proposed method is applied to Persian language and its accuracy in converting text to speech phonetics is evaluated using simulations.




References:
[1] R. Sproat, J. Hu, H. Chen, "Emu: An e-mail preprocessor for text-tospeech,--
Proc. IEEE Workshop on Multimedia Signal Proc., pp. 239-
244, Dec. 1998.
[2] C.-H. Wu and J. -H. Chen, "Speech activated telephony e-mail reader
(SATER) based on speaker verification and text-to-speech conversion,--
IEEE Trans. Consumer Electronics, vol. 43, no. 3, pp. 707-716, Aug.
1997.
[3] Sejnowski, T.J. and C.R. Rosenberg, "NETTalk: A Parallel network that
learns to read aloud", Electrical Engineering and Computer Science,
Technical Report JHU/EECS-86/01, Johns Hopkins University,
Baltimore, 1986.
[4] Sejnowski, T.J. and C.R. Rosenberg, "Parallel networks that learn to
pronounce English text", Complex Systems, vol.1, 145-168, 1987.
[5] Neural Networks in Text-to-Speech Systems for the Greek Language, 10th
IEEE Mediterranean Electro-technical Conference, MELECON, pp.
574-577, May 2000.
[6] F. Hendessi, A. Ghayoori, T. A. Gulliver, "A new text-to-speech system
for Persian using a neural network and a SEHMM", Accepted for
publication in the ACM Trans. Asian Lang. Proc., p.24, 2004.
[7] F. Hendessi, A. Ghayoori, "Text-to-phoneme Conversion using Smooth
Ergodic Hidden Markov Model", Proceedings of the 12th Iranian
Conference on Electrical Engineering, May 2004.
[8] L.R. Rabiner, "A tutorial on hidden Markov models and selected
applications in speech recognition," Proc. IEEE, Vol. 77, No. 2, pp.
257-286, Feb. 1989.
[9] Baum, T. Petrie, G. Soules & N. Weiss, "A maximization technique
occurring in the statistical analysis of probabilistic functions of markov
chains", Annuals of Mathematical Statistics pp. 41.164-171, 1970.
[10] F. Hendessi, A. Ghayoori, "Text to Phoneme Conversion in Persian
using Neural Networks", Proceedings of 9th annual of Iran computer
conference, 2004.