Application of Smooth Ergodic Hidden Markov Model in Text to Speech Systems
In developing a text-to-speech system, it is well
known that the accuracy of information extracted from a text is
crucial to produce high quality synthesized speech. In this paper, a
new scheme for converting text into its equivalent phonetic spelling
is introduced and developed. This method is applicable to many
applications in text to speech converting systems and has many
advantages over other methods. The proposed method can also
complement the other methods with a purpose of improving their
performance. The proposed method is a probabilistic model and is
based on Smooth Ergodic Hidden Markov Model. This model can be
considered as an extension to HMM. The proposed method is applied
to Persian language and its accuracy in converting text to speech
phonetics is evaluated using simulations.
[1] R. Sproat, J. Hu, H. Chen, "Emu: An e-mail preprocessor for text-tospeech,--
Proc. IEEE Workshop on Multimedia Signal Proc., pp. 239-
244, Dec. 1998.
[2] C.-H. Wu and J. -H. Chen, "Speech activated telephony e-mail reader
(SATER) based on speaker verification and text-to-speech conversion,--
IEEE Trans. Consumer Electronics, vol. 43, no. 3, pp. 707-716, Aug.
1997.
[3] Sejnowski, T.J. and C.R. Rosenberg, "NETTalk: A Parallel network that
learns to read aloud", Electrical Engineering and Computer Science,
Technical Report JHU/EECS-86/01, Johns Hopkins University,
Baltimore, 1986.
[4] Sejnowski, T.J. and C.R. Rosenberg, "Parallel networks that learn to
pronounce English text", Complex Systems, vol.1, 145-168, 1987.
[5] Neural Networks in Text-to-Speech Systems for the Greek Language, 10th
IEEE Mediterranean Electro-technical Conference, MELECON, pp.
574-577, May 2000.
[6] F. Hendessi, A. Ghayoori, T. A. Gulliver, "A new text-to-speech system
for Persian using a neural network and a SEHMM", Accepted for
publication in the ACM Trans. Asian Lang. Proc., p.24, 2004.
[7] F. Hendessi, A. Ghayoori, "Text-to-phoneme Conversion using Smooth
Ergodic Hidden Markov Model", Proceedings of the 12th Iranian
Conference on Electrical Engineering, May 2004.
[8] L.R. Rabiner, "A tutorial on hidden Markov models and selected
applications in speech recognition," Proc. IEEE, Vol. 77, No. 2, pp.
257-286, Feb. 1989.
[9] Baum, T. Petrie, G. Soules & N. Weiss, "A maximization technique
occurring in the statistical analysis of probabilistic functions of markov
chains", Annuals of Mathematical Statistics pp. 41.164-171, 1970.
[10] F. Hendessi, A. Ghayoori, "Text to Phoneme Conversion in Persian
using Neural Networks", Proceedings of 9th annual of Iran computer
conference, 2004.
[1] R. Sproat, J. Hu, H. Chen, "Emu: An e-mail preprocessor for text-tospeech,--
Proc. IEEE Workshop on Multimedia Signal Proc., pp. 239-
244, Dec. 1998.
[2] C.-H. Wu and J. -H. Chen, "Speech activated telephony e-mail reader
(SATER) based on speaker verification and text-to-speech conversion,--
IEEE Trans. Consumer Electronics, vol. 43, no. 3, pp. 707-716, Aug.
1997.
[3] Sejnowski, T.J. and C.R. Rosenberg, "NETTalk: A Parallel network that
learns to read aloud", Electrical Engineering and Computer Science,
Technical Report JHU/EECS-86/01, Johns Hopkins University,
Baltimore, 1986.
[4] Sejnowski, T.J. and C.R. Rosenberg, "Parallel networks that learn to
pronounce English text", Complex Systems, vol.1, 145-168, 1987.
[5] Neural Networks in Text-to-Speech Systems for the Greek Language, 10th
IEEE Mediterranean Electro-technical Conference, MELECON, pp.
574-577, May 2000.
[6] F. Hendessi, A. Ghayoori, T. A. Gulliver, "A new text-to-speech system
for Persian using a neural network and a SEHMM", Accepted for
publication in the ACM Trans. Asian Lang. Proc., p.24, 2004.
[7] F. Hendessi, A. Ghayoori, "Text-to-phoneme Conversion using Smooth
Ergodic Hidden Markov Model", Proceedings of the 12th Iranian
Conference on Electrical Engineering, May 2004.
[8] L.R. Rabiner, "A tutorial on hidden Markov models and selected
applications in speech recognition," Proc. IEEE, Vol. 77, No. 2, pp.
257-286, Feb. 1989.
[9] Baum, T. Petrie, G. Soules & N. Weiss, "A maximization technique
occurring in the statistical analysis of probabilistic functions of markov
chains", Annuals of Mathematical Statistics pp. 41.164-171, 1970.
[10] F. Hendessi, A. Ghayoori, "Text to Phoneme Conversion in Persian
using Neural Networks", Proceedings of 9th annual of Iran computer
conference, 2004.
@article{"International Journal of Information, Control and Computer Sciences:57540", author = "Armin Ghayoori and Faramarz Hendessi and Asrar Sheikh", title = "Application of Smooth Ergodic Hidden Markov Model in Text to Speech Systems", abstract = "In developing a text-to-speech system, it is well
known that the accuracy of information extracted from a text is
crucial to produce high quality synthesized speech. In this paper, a
new scheme for converting text into its equivalent phonetic spelling
is introduced and developed. This method is applicable to many
applications in text to speech converting systems and has many
advantages over other methods. The proposed method can also
complement the other methods with a purpose of improving their
performance. The proposed method is a probabilistic model and is
based on Smooth Ergodic Hidden Markov Model. This model can be
considered as an extension to HMM. The proposed method is applied
to Persian language and its accuracy in converting text to speech
phonetics is evaluated using simulations.", keywords = "Hidden Markov Models, text, synthesis.", volume = "2", number = "8", pages = "2711-7", }