Automatic Lip Contour Tracking and Visual Character Recognition for Computerized Lip Reading
Computerized lip reading has been one of the most
actively researched areas of computer vision in recent past because
of its crime fighting potential and invariance to acoustic environment.
However, several factors like fast speech, bad pronunciation,
poor illumination, movement of face, moustaches and beards make
lip reading difficult. In present work, we propose a solution for
automatic lip contour tracking and recognizing letters of English
language spoken by speakers using the information available from
lip movements. Level set method is used for tracking lip contour
using a contour velocity model and a feature vector of lip movements
is then obtained. Character recognition is performed using modified
k nearest neighbor algorithm which assigns more weight to nearer
neighbors. The proposed system has been found to have accuracy
of 73.3% for character recognition with speaker lip movements as
the only input and without using any speech recognition system in
parallel. The approach used in this work is found to significantly
solve the purpose of lip reading when size of database is small.
[1] H. McGurk, J. MacDonald, "Hearing lips and seeing voices," J. Nature,
vol. 264(5588), 1976, pp 746-748.
[2] W.C. Yau. (2009). Computer-based lip-reading
using motion templates [Online]. Available:
http://www.ieeevic.org/events/getdetails.php?id=234(URL)
[3] T.F. Cootes, A. Hill, C.J. Taylor, J. Haslam, "The use of active shape
models for locating structures in medical images," J. Image Vis. Comput.
vol. 12 Issue 6, 1994, pp 355-366.
[4] T.F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham, "Active shape models-
Their training and application," J. Comput. Vis. Image Underst. vol 61
Issue 1, 1995, pp 38-59.
[5] I. Matthews, T.F. Cootes, J.A. Bangham, S. Cox, R. Harvey, "Extraction
of visual features for lipreading," IEEE Trans. Pattern Anal. Mach. Intell.
vol. 24 Issue 2, pp 198-213, 2002.
[6] T.F. Cootes, G.J. Edwards, C.J. Taylor, "Active appearance models," in
Proc. European Conf. Comput. Vis., June 1998, pp. 484-498.
[7] M. Kass, A. Witkin, D. Terzopoulos, "Snakes: Active contour model,"
Int. J. Comput. Vis. vol. 1, 1987, pp 321-331.
[8] A. Souza, J.K. Udupa, "Automatic landmark selection for active shape
models- Medical imaging: Image Processing", in Proc. of the SPIE, Vol.
5747,2005, pp. 1377-1383.
[9] K. Domijan, S. Wilson, "A Bayseian method for automatic landmark
detection in segmented images," in Proc. of the workshop on Mach.
Learning Techniques for Processing Multimedia Content, Bonn Germany,
2005.
[10] N. Eveno, A. Caplier, P.Y. Coulon, "New color transformation for lips
segmentation," in Proc. IEEE 4th Workshop Multimedia Signal Proc.,
France, 2001, pp. 3-8.
[11] T. Wark, S. Sridharan, V. Chandran, "An approach to statistical lip
modeling for speaker identification via Chromatic Feature Extraction,"
in Proc. 4th Intl. Conf. Pattern Recognition., Brisbane, Australia, 1998,
pp. 123-125.
[12] S. Osher, J.A. Sethian, "Fronts propagating with curvature dependent
speed: Algorithms based on Hamilton-Jacobi formulations," J. Comput.
Phys. vol. 79 Issue 1, 1988, 12-49.
[13] A. Sayeed Md. Sohail, P. Bhattacharya, "Automated lip contour detection
using the level set segmentation method," Int. Conf. on Image Anal.
and Proc. (ICIAP 2007), pp. 425-430.
[14] H. Mehrotra, G. Agrawal, M.C. Srivastava, "Automatic lip contour
extraction using level set evolution, 3rd Int. Conf. on Information Proc.,
(ICIP 2009) Bangalore, to be published.
[15] Wikipedia [Online], Available: http://en.wikipedia.org/wiki/Video tracking
(URL)
[16] Y.M. Kim, Object tracking in video sequence,[Online], Available:
http://www.stanford.edu/Ôê╝jinhae/CS229 report.pdf(URL).
[17] D.G. Lowe, "Distinctive image features from scale-invariant keypoints,"
Int. J. Comput. Vis. vol.60 Issue 2, 2004, pp 91-110.
[18] G. Welch, G. Bishop, "An introduction to Kalman filter,"
Technical report: TR95-041, July 24, 2006. [Online], Available:
http://www.cs.unc.edu/Ôê╝welch/media/pdf/kalman intro.pdf(URL).
[19] Y. Tian, T. Kanade, J.F. Cohn, "Robust lip tracking by combining shape,
color and motion," in Proc. of the 4th Asian Conf. on Comput. Vis.
(ACCV-00), Jan, 2000, pp. 1040-1045.
[20] J. Chen, Y. Laprie, M.O. Berger, "A robust lip tracking system for
acoustic to articulatory inversion," The 6th IASTED Int. Conf. on Signal
and Image Proc., August 2004, USA.
[21] E.D. Petajan, "Automatic lipreading to enhance speech recognition," in
Proc. of the IEEE Communication Society Global Telecommunications
Conference", November 26-29, 1984, Atlanta, Georgia.
[22] E.D. Petajan, B. Bischoff, D. Bodoff, "An improved automatic lipreading
system to enhance speech recognition," ACM SIGCHI-88, 19-25 (1988).
[23] A.J. Goldschen, O.N. Garcia, E. Petajan, "Continuous optical automatic
speech recognition by lipreading," 28th Annual Asilmomar Conference
on Signals, Systems, and Computer, 1994.
[24] A. Pentland, K. Mase, "Lip reading: Automatic visual recognition of
spoken words," in Proc. Image Understanding and Mach. Vis., Optical
Society of America, June 12-14 (1989).
[25] S. Pachoud, S. Gong, A. Cavallaro, "Macro-cuboid based probabilistic
matching for lip-reading digits," IEEE Computer Society Conf. on
Comput. Vis. and Pattern Recognition (CVPR), USA, June 2008
[26] Y. Qu, P.A. Heng, T.T. Wong, "Image segmentation using the level set
method," in: Deformable Models II: Theory and Biomaterial Applications,
J.S. Suri, A. Farag, Ed, Springer, 2007,pp. 95-122.
[27] M. Sussman, P. Smereka, S. Osher, "A level set approach for computing
solutions to incompressible two-phase flow," J. Comput. Phys. vol 114,
1994, pp 146-159.
[28] D. Peng, B. Merriman, S. Osher, H. Zhao, M. Kang, "A PDE based fast
local level set method," J. Comp. Phys. Vol. 155, 1999, pp. 410-438.
[29] Wikipedia Online, Available: http://en.wikipedia.org/wiki/Knearest
neighbor algorithm(URL).
[30] R. Malladi, J.A. Sethian, B.C. Vemuri, "Shape modeling with front
propagation: A level set approach," IEEE Trans on Pattern Anal. and
Mach. Intell., Vol. 17, 1995, pp. 158-175.
[31] S. Osher, R.P. Fedkiw, "Motion involving mean curvature," in: Level set
methods and dynamic implicit surfaces, Springer, 2002, pp. 41-46.
[1] H. McGurk, J. MacDonald, "Hearing lips and seeing voices," J. Nature,
vol. 264(5588), 1976, pp 746-748.
[2] W.C. Yau. (2009). Computer-based lip-reading
using motion templates [Online]. Available:
http://www.ieeevic.org/events/getdetails.php?id=234(URL)
[3] T.F. Cootes, A. Hill, C.J. Taylor, J. Haslam, "The use of active shape
models for locating structures in medical images," J. Image Vis. Comput.
vol. 12 Issue 6, 1994, pp 355-366.
[4] T.F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham, "Active shape models-
Their training and application," J. Comput. Vis. Image Underst. vol 61
Issue 1, 1995, pp 38-59.
[5] I. Matthews, T.F. Cootes, J.A. Bangham, S. Cox, R. Harvey, "Extraction
of visual features for lipreading," IEEE Trans. Pattern Anal. Mach. Intell.
vol. 24 Issue 2, pp 198-213, 2002.
[6] T.F. Cootes, G.J. Edwards, C.J. Taylor, "Active appearance models," in
Proc. European Conf. Comput. Vis., June 1998, pp. 484-498.
[7] M. Kass, A. Witkin, D. Terzopoulos, "Snakes: Active contour model,"
Int. J. Comput. Vis. vol. 1, 1987, pp 321-331.
[8] A. Souza, J.K. Udupa, "Automatic landmark selection for active shape
models- Medical imaging: Image Processing", in Proc. of the SPIE, Vol.
5747,2005, pp. 1377-1383.
[9] K. Domijan, S. Wilson, "A Bayseian method for automatic landmark
detection in segmented images," in Proc. of the workshop on Mach.
Learning Techniques for Processing Multimedia Content, Bonn Germany,
2005.
[10] N. Eveno, A. Caplier, P.Y. Coulon, "New color transformation for lips
segmentation," in Proc. IEEE 4th Workshop Multimedia Signal Proc.,
France, 2001, pp. 3-8.
[11] T. Wark, S. Sridharan, V. Chandran, "An approach to statistical lip
modeling for speaker identification via Chromatic Feature Extraction,"
in Proc. 4th Intl. Conf. Pattern Recognition., Brisbane, Australia, 1998,
pp. 123-125.
[12] S. Osher, J.A. Sethian, "Fronts propagating with curvature dependent
speed: Algorithms based on Hamilton-Jacobi formulations," J. Comput.
Phys. vol. 79 Issue 1, 1988, 12-49.
[13] A. Sayeed Md. Sohail, P. Bhattacharya, "Automated lip contour detection
using the level set segmentation method," Int. Conf. on Image Anal.
and Proc. (ICIAP 2007), pp. 425-430.
[14] H. Mehrotra, G. Agrawal, M.C. Srivastava, "Automatic lip contour
extraction using level set evolution, 3rd Int. Conf. on Information Proc.,
(ICIP 2009) Bangalore, to be published.
[15] Wikipedia [Online], Available: http://en.wikipedia.org/wiki/Video tracking
(URL)
[16] Y.M. Kim, Object tracking in video sequence,[Online], Available:
http://www.stanford.edu/Ôê╝jinhae/CS229 report.pdf(URL).
[17] D.G. Lowe, "Distinctive image features from scale-invariant keypoints,"
Int. J. Comput. Vis. vol.60 Issue 2, 2004, pp 91-110.
[18] G. Welch, G. Bishop, "An introduction to Kalman filter,"
Technical report: TR95-041, July 24, 2006. [Online], Available:
http://www.cs.unc.edu/Ôê╝welch/media/pdf/kalman intro.pdf(URL).
[19] Y. Tian, T. Kanade, J.F. Cohn, "Robust lip tracking by combining shape,
color and motion," in Proc. of the 4th Asian Conf. on Comput. Vis.
(ACCV-00), Jan, 2000, pp. 1040-1045.
[20] J. Chen, Y. Laprie, M.O. Berger, "A robust lip tracking system for
acoustic to articulatory inversion," The 6th IASTED Int. Conf. on Signal
and Image Proc., August 2004, USA.
[21] E.D. Petajan, "Automatic lipreading to enhance speech recognition," in
Proc. of the IEEE Communication Society Global Telecommunications
Conference", November 26-29, 1984, Atlanta, Georgia.
[22] E.D. Petajan, B. Bischoff, D. Bodoff, "An improved automatic lipreading
system to enhance speech recognition," ACM SIGCHI-88, 19-25 (1988).
[23] A.J. Goldschen, O.N. Garcia, E. Petajan, "Continuous optical automatic
speech recognition by lipreading," 28th Annual Asilmomar Conference
on Signals, Systems, and Computer, 1994.
[24] A. Pentland, K. Mase, "Lip reading: Automatic visual recognition of
spoken words," in Proc. Image Understanding and Mach. Vis., Optical
Society of America, June 12-14 (1989).
[25] S. Pachoud, S. Gong, A. Cavallaro, "Macro-cuboid based probabilistic
matching for lip-reading digits," IEEE Computer Society Conf. on
Comput. Vis. and Pattern Recognition (CVPR), USA, June 2008
[26] Y. Qu, P.A. Heng, T.T. Wong, "Image segmentation using the level set
method," in: Deformable Models II: Theory and Biomaterial Applications,
J.S. Suri, A. Farag, Ed, Springer, 2007,pp. 95-122.
[27] M. Sussman, P. Smereka, S. Osher, "A level set approach for computing
solutions to incompressible two-phase flow," J. Comput. Phys. vol 114,
1994, pp 146-159.
[28] D. Peng, B. Merriman, S. Osher, H. Zhao, M. Kang, "A PDE based fast
local level set method," J. Comp. Phys. Vol. 155, 1999, pp. 410-438.
[29] Wikipedia Online, Available: http://en.wikipedia.org/wiki/Knearest
neighbor algorithm(URL).
[30] R. Malladi, J.A. Sethian, B.C. Vemuri, "Shape modeling with front
propagation: A level set approach," IEEE Trans on Pattern Anal. and
Mach. Intell., Vol. 17, 1995, pp. 158-175.
[31] S. Osher, R.P. Fedkiw, "Motion involving mean curvature," in: Level set
methods and dynamic implicit surfaces, Springer, 2002, pp. 41-46.
@article{"International Journal of Electrical, Electronic and Communication Sciences:59052", author = "Harshit Mehrotra and Gaurav Agrawal and M.C. Srivastava", title = "Automatic Lip Contour Tracking and Visual Character Recognition for Computerized Lip Reading", abstract = "Computerized lip reading has been one of the most
actively researched areas of computer vision in recent past because
of its crime fighting potential and invariance to acoustic environment.
However, several factors like fast speech, bad pronunciation,
poor illumination, movement of face, moustaches and beards make
lip reading difficult. In present work, we propose a solution for
automatic lip contour tracking and recognizing letters of English
language spoken by speakers using the information available from
lip movements. Level set method is used for tracking lip contour
using a contour velocity model and a feature vector of lip movements
is then obtained. Character recognition is performed using modified
k nearest neighbor algorithm which assigns more weight to nearer
neighbors. The proposed system has been found to have accuracy
of 73.3% for character recognition with speaker lip movements as
the only input and without using any speech recognition system in
parallel. The approach used in this work is found to significantly
solve the purpose of lip reading when size of database is small.", keywords = "Contour Velocity Model, Lip Contour Tracking, LipReading, Visual Character Recognition.", volume = "3", number = "4", pages = "890-10", }