Automatic Lip Contour Tracking and Visual Character Recognition for Computerized Lip Reading

Computerized lip reading has been one of the most actively researched areas of computer vision in recent past because of its crime fighting potential and invariance to acoustic environment. However, several factors like fast speech, bad pronunciation, poor illumination, movement of face, moustaches and beards make lip reading difficult. In present work, we propose a solution for automatic lip contour tracking and recognizing letters of English language spoken by speakers using the information available from lip movements. Level set method is used for tracking lip contour using a contour velocity model and a feature vector of lip movements is then obtained. Character recognition is performed using modified k nearest neighbor algorithm which assigns more weight to nearer neighbors. The proposed system has been found to have accuracy of 73.3% for character recognition with speaker lip movements as the only input and without using any speech recognition system in parallel. The approach used in this work is found to significantly solve the purpose of lip reading when size of database is small.




References:
[1] H. McGurk, J. MacDonald, "Hearing lips and seeing voices," J. Nature,
vol. 264(5588), 1976, pp 746-748.
[2] W.C. Yau. (2009). Computer-based lip-reading
using motion templates [Online]. Available:
http://www.ieeevic.org/events/getdetails.php?id=234(URL)
[3] T.F. Cootes, A. Hill, C.J. Taylor, J. Haslam, "The use of active shape
models for locating structures in medical images," J. Image Vis. Comput.
vol. 12 Issue 6, 1994, pp 355-366.
[4] T.F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham, "Active shape models-
Their training and application," J. Comput. Vis. Image Underst. vol 61
Issue 1, 1995, pp 38-59.
[5] I. Matthews, T.F. Cootes, J.A. Bangham, S. Cox, R. Harvey, "Extraction
of visual features for lipreading," IEEE Trans. Pattern Anal. Mach. Intell.
vol. 24 Issue 2, pp 198-213, 2002.
[6] T.F. Cootes, G.J. Edwards, C.J. Taylor, "Active appearance models," in
Proc. European Conf. Comput. Vis., June 1998, pp. 484-498.
[7] M. Kass, A. Witkin, D. Terzopoulos, "Snakes: Active contour model,"
Int. J. Comput. Vis. vol. 1, 1987, pp 321-331.
[8] A. Souza, J.K. Udupa, "Automatic landmark selection for active shape
models- Medical imaging: Image Processing", in Proc. of the SPIE, Vol.
5747,2005, pp. 1377-1383.
[9] K. Domijan, S. Wilson, "A Bayseian method for automatic landmark
detection in segmented images," in Proc. of the workshop on Mach.
Learning Techniques for Processing Multimedia Content, Bonn Germany,
2005.
[10] N. Eveno, A. Caplier, P.Y. Coulon, "New color transformation for lips
segmentation," in Proc. IEEE 4th Workshop Multimedia Signal Proc.,
France, 2001, pp. 3-8.
[11] T. Wark, S. Sridharan, V. Chandran, "An approach to statistical lip
modeling for speaker identification via Chromatic Feature Extraction,"
in Proc. 4th Intl. Conf. Pattern Recognition., Brisbane, Australia, 1998,
pp. 123-125.
[12] S. Osher, J.A. Sethian, "Fronts propagating with curvature dependent
speed: Algorithms based on Hamilton-Jacobi formulations," J. Comput.
Phys. vol. 79 Issue 1, 1988, 12-49.
[13] A. Sayeed Md. Sohail, P. Bhattacharya, "Automated lip contour detection
using the level set segmentation method," Int. Conf. on Image Anal.
and Proc. (ICIAP 2007), pp. 425-430.
[14] H. Mehrotra, G. Agrawal, M.C. Srivastava, "Automatic lip contour
extraction using level set evolution, 3rd Int. Conf. on Information Proc.,
(ICIP 2009) Bangalore, to be published.
[15] Wikipedia [Online], Available: http://en.wikipedia.org/wiki/Video tracking
(URL)
[16] Y.M. Kim, Object tracking in video sequence,[Online], Available:
http://www.stanford.edu/Ôê╝jinhae/CS229 report.pdf(URL).
[17] D.G. Lowe, "Distinctive image features from scale-invariant keypoints,"
Int. J. Comput. Vis. vol.60 Issue 2, 2004, pp 91-110.
[18] G. Welch, G. Bishop, "An introduction to Kalman filter,"
Technical report: TR95-041, July 24, 2006. [Online], Available:
http://www.cs.unc.edu/Ôê╝welch/media/pdf/kalman intro.pdf(URL).
[19] Y. Tian, T. Kanade, J.F. Cohn, "Robust lip tracking by combining shape,
color and motion," in Proc. of the 4th Asian Conf. on Comput. Vis.
(ACCV-00), Jan, 2000, pp. 1040-1045.
[20] J. Chen, Y. Laprie, M.O. Berger, "A robust lip tracking system for
acoustic to articulatory inversion," The 6th IASTED Int. Conf. on Signal
and Image Proc., August 2004, USA.
[21] E.D. Petajan, "Automatic lipreading to enhance speech recognition," in
Proc. of the IEEE Communication Society Global Telecommunications
Conference", November 26-29, 1984, Atlanta, Georgia.
[22] E.D. Petajan, B. Bischoff, D. Bodoff, "An improved automatic lipreading
system to enhance speech recognition," ACM SIGCHI-88, 19-25 (1988).
[23] A.J. Goldschen, O.N. Garcia, E. Petajan, "Continuous optical automatic
speech recognition by lipreading," 28th Annual Asilmomar Conference
on Signals, Systems, and Computer, 1994.
[24] A. Pentland, K. Mase, "Lip reading: Automatic visual recognition of
spoken words," in Proc. Image Understanding and Mach. Vis., Optical
Society of America, June 12-14 (1989).
[25] S. Pachoud, S. Gong, A. Cavallaro, "Macro-cuboid based probabilistic
matching for lip-reading digits," IEEE Computer Society Conf. on
Comput. Vis. and Pattern Recognition (CVPR), USA, June 2008
[26] Y. Qu, P.A. Heng, T.T. Wong, "Image segmentation using the level set
method," in: Deformable Models II: Theory and Biomaterial Applications,
J.S. Suri, A. Farag, Ed, Springer, 2007,pp. 95-122.
[27] M. Sussman, P. Smereka, S. Osher, "A level set approach for computing
solutions to incompressible two-phase flow," J. Comput. Phys. vol 114,
1994, pp 146-159.
[28] D. Peng, B. Merriman, S. Osher, H. Zhao, M. Kang, "A PDE based fast
local level set method," J. Comp. Phys. Vol. 155, 1999, pp. 410-438.
[29] Wikipedia Online, Available: http://en.wikipedia.org/wiki/Knearest
neighbor algorithm(URL).
[30] R. Malladi, J.A. Sethian, B.C. Vemuri, "Shape modeling with front
propagation: A level set approach," IEEE Trans on Pattern Anal. and
Mach. Intell., Vol. 17, 1995, pp. 158-175.
[31] S. Osher, R.P. Fedkiw, "Motion involving mean curvature," in: Level set
methods and dynamic implicit surfaces, Springer, 2002, pp. 41-46.