Effect of Visual Speech in Sign Speech Synthesis

This article investigates a contribution of synthesized visual speech. Synthesis of visual speech expressed by a computer consists in an animation in particular movements of lips. Visual speech is also necessary part of the non-manual component of a sign language. Appropriate methodology is proposed to determine the quality and the accuracy of synthesized visual speech. Proposed methodology is inspected on Czech speech. Hence, this article presents a procedure of recording of speech data in order to set a synthesis system as well as to evaluate synthesized speech. Furthermore, one option of the evaluation process is elaborated in the form of a perceptual test. This test procedure is verified on the measured data with two settings of the synthesis system. The results of the perceptual test are presented as a statistically significant increase of intelligibility evoked by real and synthesized visual speech. Now, the aim is to show one part of evaluation process which leads to more comprehensive evaluation of the sign speech synthesis system.


Authors:



References:
[1] R. Conrad, The deaf school child. London: Harper & Row, 1979.
[2] O. Velehradsk'a and K. Kuchler, "Pr °uzkum ˇcten'aˇrsk'ych dovednost'ı na
ˇskol'ach pro dˇeti s vadami sluchu," INFO-Zpravodaj FRPSP, vol. 6,
1998.
[3] P. Campr, M. Hr 'uz, A. Karpov, P. Santemiz, M. ˇZelezn'y,
and O. Aran, "Sign-language-enabled information kiosk," 2009.
(Online). Available: http://www.kky.zcu.cz/en/publications/CamprP
2009 Sign-language-enabled
[4] M. ˇZelezn'y, Z. Kr ˇnoul, P. C'ısaˇr, and J. Matouˇsek, "Design, implementation
and evaluation of the czech realistic audio-visual speech synthesis,"
Signal Procesing, Special section: Multimodal human-computer interfaces,
vol. 86, pp. 3657-3673, 2006.
[5] V. Radov'a and P. Vop'alka, "Methods of sentences selection for readspeech
corpus design," Lecture Notes In Computer Science, vol. 1692,
1999.
[6] J. Psutka, L. M¨uller, J. Matouˇsek, and V. Radov'a, Mluv'ıme s poˇc'ıtaˇcem
ˇcesky, 1st ed. Praha: Academia, 2006.
[7] A. MacLeod and Q. Summerfield, "A procedure for measuring auditory
and audio-visual speech-reception thresholds for sentences in noise:
rationale, evaluation, and recommendations for use," British Journal of
Audiology, 24(1), 29-43, 1990.
[8] A. B¨ohmov'a, J. Hajiˇc, E. Hajiˇcov'a, and B. Hladk'a, "The prague dependency
treebank: Three-level annotation scenario," Treebanks: Building
and Using Syntactically Annotated Corpora, ed. Anne Abeille. Kluwer
Academic Publishers, 2001.
[9] M. M. Cohen and D. W. Massaro, "Modeling coarticulation in synthetic
visual speech," in Models and Techniques in Computer Animation, N.
M. Thalmann & D. Thalmann, Ed. Tokyo: Springer-Verlag, 1993.
[10] Z. Kr ˇnoul and M. ˇZelezn'y, "Development of czech talking head," in
Proceedings of Interspeech 2008, Brisbane, Australia, 2008.
[11] J. Beskow, "Trainable articulatory control models for visual speech
synthesis," International Journal of Speech Technology, 2004, submitted.
[12] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification
and Regression Trees, 1st ed. Chapman and Hall, Boca Raton, 1998.
[13] Z. Kr ˇnoul and M. ˇZelezn'y, "Realistic face animation for a Czech Talking
Head," in Proceedings of TEXT, SPEECH and DIALOGUE, TSD 2004,
Brno, Czech republic, 2004.
[14] Z. Kr ˇnoul, M. ˇZelezn'y, P. C'ısaˇr, and J. Holas, "Viseme analysis for
speech-driven facial animation for czech audio-visual speech synthesis,"
in Proceedings of SPECOM 2005, University of Patras, Greece, 2005.
[15] Z. Kr ˇnoul, P. C'ısaˇr, and M. ˇZelezn'y, "Face model reconstruction for
czech audio-visual speech synthesis," in SPECOM 2004, St. Petersburg,
Russian Federation, 2004.