Inferring Hierarchical Pronunciation Rules from a Phonetic Dictionary

This work presents a new phonetic transcription system based on a tree of hierarchical pronunciation rules expressed as context-specific grapheme-phoneme correspondences. The tree is automatically inferred from a phonetic dictionary by incrementally analyzing deeper context levels, eventually representing a minimum set of exhaustive rules that pronounce without errors all the words in the training dictionary and that can be applied to out-of-vocabulary words. The proposed approach improves upon existing rule-tree-based techniques in that it makes use of graphemes, rather than letters, as elementary orthographic units. A new linear algorithm for the segmentation of a word in graphemes is introduced to enable outof- vocabulary grapheme-based phonetic transcription. Exhaustive rule trees provide a canonical representation of the pronunciation rules of a language that can be used not only to pronounce out-of-vocabulary words, but also to analyze and compare the pronunciation rules inferred from different dictionaries. The proposed approach has been implemented in C and tested on Oxford British English and Basic English. Experimental results show that grapheme-based rule trees represent phonetically sound rules and provide better performance than letter-based rule trees.





References:
[1] A. Aho, Algorithms for finding patterns in strings. In J. van Leeuwen,
editor, Handbook of Theoretical Computer Science - Vol. A. MIT Press /
Elsevier, pages 257-300, 1990.
[2] J. Bellegarda, A novel approach to unsupervised grapheme to phoneme
conversion. Pronunciation modeling and lexicon adaptation for Spoken
Language (Interspeeech-ICSLP), 2002.
[3] N. Chomsky, and M. Halle, The Sound Pattern of English, 1968. Harper
and Row, New York, USA.
[4] W. Daelemans, and A. van den Bosch, Language-independent dataoriented
grapheme-to-phoneme conversion. In J.P.H. van Santen, R.W.
Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis.
Springer, New York, pages 77-89, 1997.
[5] R.I. Damper, and Y. Marchand, A multi-strategy approach to improving
pronounciation by analogy. Computational Linguistics, 26:195-219,
2000.
[6] R.I. Damper, Y. Marchand, M.J. Adamson, and K. Gustafson, Evaluating
the pronunciation component of text-to-speech systems for english: a
performance comparison of different approaches. Computer Speech and
Language, 13:155-176, 1999.
[7] M.J. Dedina and H.C. Nusbaum, Pronounce: A program for pronounciation
by analogy. Computer Speech and Language, 5:55-64, 1991.
[8] M. Divay and A.J. Vitale, Algorithms for grapheme-phoneme translation
for english and french: applications for database searches and speech
synthesis. Computational Linguistics, 23:495-523, 1997.
[9] T. Dutoit, High-quality text-to-speech synthesis : an overview. Journal
of Electrical & Electronics Engineering, 17:25-37, 1997.
[10] J. Hochberg, S.M. Mniszewski, T. Calleja, and G.J. Papcun, A default
hierarchy for pronouncing english. IEEE Transactions on Pattern Matching
and Machine Intelligence, 13:957-964, 1991.
[11] J. Lucassen, R. Mercer, An information theoretic approach to the
automatic determination of phonemic baseforms Proc. ICASSP-84 (International
Conference on Acoustics, Speech, and Signal Processing), 1984.
[12] C.J. Ogden, Basic English: International Second Language. Hartcourt,
Brace & Jovanovich, New York, USA, 1968.
[13] V. Pagel, K. Lenzo, A.W. Black, Letter to sound rules for accented
lexicon compression Proc. ICSLP-1998 (5th International Conference on
Spoken Language Processing), 1998.
[14] A. Plucinski, A dynamic context shortening method for a minimumcontext
grapheme-to-phoneme data-driven transducer generator. Journal
of Quantitative Linguistics, 13:195-223, 2006.
[15] P.A. Taylor, A. Black, and R. Caley, The architecture of the festival
speech synthesis system. The third ESCA Workshop on Speech Synthesis,
147-151, 1998.
[16] P. Taylor, Hidden Markov Models for grapheme to phoneme conversion.
The 9th European Conference on Speech Communication and Technology
(Interspeeech), 2005.
[17] K. Torkkola, An efficient way to learn English grapheme-to-phoneme
rules automatically. Proc. ICASSP-93 (International Conference on
Acoustics, Speech, and Signal Processing), 1993.