Abstract: A talking head system (THS) is presented to animate
the face of a speaking 3D avatar in such a way that it realistically
pronounces the given Korean text. The proposed system consists of
SAPI compliant text-to-speech (TTS) engine and MPEG-4 compliant
face animation generator. The input to the THS is a unicode text that is
to be spoken with synchronized lip shape. The TTS engine generates a
phoneme sequence with their duration and audio data. The TTS
applies the coarticulation rules to the phoneme sequence and sends a
mouth animation sequence to the face modeler. The proposed THS can
make more natural lip sync and facial expression by using the face
animation generator than those using the conventional visemes only.
The experimental results show that our system has great potential for
the implementation of talking head for Korean text.
Abstract: This article investigates a contribution of synthesized visual speech. Synthesis of visual speech expressed by a computer consists in an animation in particular movements of lips. Visual speech is also necessary part of the non-manual component of a sign language. Appropriate methodology is proposed to determine the quality and the accuracy of synthesized visual speech. Proposed methodology is inspected on Czech speech. Hence, this article presents a procedure of recording of speech data in order to set a synthesis system as well as to evaluate synthesized speech. Furthermore, one option of the evaluation process is elaborated in the form of a perceptual test. This test procedure is verified on the measured data with two settings of the synthesis system. The results of the perceptual test are presented as a statistically significant increase of intelligibility evoked by real and synthesized visual speech. Now, the aim is to show one part of evaluation process which leads to more comprehensive evaluation of the sign speech synthesis system.
Abstract: This paper deals with automatic sentence modality
recognition in French. In this work, only prosodic features are
considered. The sentences are recognized according to the three
following modalities: declarative, interrogative and exclamatory
sentences. This information will be used to animate a talking head for
deaf and hearing-impaired children. We first statistically study a real
radio corpus in order to assess the feasibility of the automatic
modeling of sentence types. Then, we test two sets of prosodic
features as well as two different classifiers and their combination. We
further focus our attention on questions recognition, as this modality
is certainly the most important one for the target application.