Realtime Lip Contour Tracking For Audio-Visual Speech Recognition Applications
Detection and tracking of the lip contour is an important
issue in speechreading. While there are solutions for lip tracking
once a good contour initialization in the first frame is available,
the problem of finding such a good initialization is not yet solved
automatically, but done manually. We have developed a new tracking
solution for lip contour detection using only few landmarks (15
to 25) and applying the well known Active Shape Models (ASM).
The proposed method is a new LMS-like adaptive scheme based on
an Auto regressive (AR) model that has been fit on the landmark
variations in successive video frames. Moreover, we propose an extra
motion compensation model to address more general cases in lip
tracking. Computer simulations demonstrate a fair match between
the true and the estimated spatial pixels. Significant improvements
related to the well known LMS approach has been obtained via a
defined Frobenius norm index.
[1] R. Caucic et al., "Real time lip tarcking for audio-visual speech recognition
applications,"Proc. European Conf. Computer Vision, Cambridge,
UK, pp. 376-387, April 1996.
[2] S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous
speech recognition,"IEEE Transactions on Multimedia, vol. 2, no. 3,
pp.141-151, Sept 2000.
[3] S. L. Wang, W. H. Lau, and S. H. Leung. "A new real-time lip contour
extraction algorithm" Proc. IEEE international conference on Acoustics,
Speech and Signal Processing, ICASSP- 03, Hong Kong, Vol. 3, pp. 578-
582, April 2003.
[4] I. Mattews, T.F Cootes, J.A Bangham, S. Cox, R. Harvey, Extraction of
visual features for lipreading, IEEE Tran. on PAMI, vol.24, pp.198-213,
Feb. 2002.
[5] S. Mirsaidi and G. A. Fleury, and J. Oskman, "LMS like AR modeling
in the case of missing observations," IEEE Trans. on signal processing,
vol. 45, no. 6, pp.1574-1583 , June 1997.
[1] R. Caucic et al., "Real time lip tarcking for audio-visual speech recognition
applications,"Proc. European Conf. Computer Vision, Cambridge,
UK, pp. 376-387, April 1996.
[2] S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous
speech recognition,"IEEE Transactions on Multimedia, vol. 2, no. 3,
pp.141-151, Sept 2000.
[3] S. L. Wang, W. H. Lau, and S. H. Leung. "A new real-time lip contour
extraction algorithm" Proc. IEEE international conference on Acoustics,
Speech and Signal Processing, ICASSP- 03, Hong Kong, Vol. 3, pp. 578-
582, April 2003.
[4] I. Mattews, T.F Cootes, J.A Bangham, S. Cox, R. Harvey, Extraction of
visual features for lipreading, IEEE Tran. on PAMI, vol.24, pp.198-213,
Feb. 2002.
[5] S. Mirsaidi and G. A. Fleury, and J. Oskman, "LMS like AR modeling
in the case of missing observations," IEEE Trans. on signal processing,
vol. 45, no. 6, pp.1574-1583 , June 1997.
@article{"International Journal of Electrical, Electronic and Communication Sciences:53913", author = "Mehran Yazdi and Mehdi Seyfi and Amirhossein Rafati and Meghdad Asadi", title = "Realtime Lip Contour Tracking For Audio-Visual Speech Recognition Applications", abstract = "Detection and tracking of the lip contour is an important
issue in speechreading. While there are solutions for lip tracking
once a good contour initialization in the first frame is available,
the problem of finding such a good initialization is not yet solved
automatically, but done manually. We have developed a new tracking
solution for lip contour detection using only few landmarks (15
to 25) and applying the well known Active Shape Models (ASM).
The proposed method is a new LMS-like adaptive scheme based on
an Auto regressive (AR) model that has been fit on the landmark
variations in successive video frames. Moreover, we propose an extra
motion compensation model to address more general cases in lip
tracking. Computer simulations demonstrate a fair match between
the true and the estimated spatial pixels. Significant improvements
related to the well known LMS approach has been obtained via a
defined Frobenius norm index.", keywords = "Lip contour, Tracking, LMS-Like", volume = "2", number = "4", pages = "578-4", }