FSM-based Recognition of Dynamic Hand Gestures via Gesture Summarization Using Key Video Object Planes

The use of human hand as a natural interface for humancomputer interaction (HCI) serves as the motivation for research in hand gesture recognition. Vision-based hand gesture recognition involves visual analysis of hand shape, position and/or movement. In this paper, we use the concept of object-based video abstraction for segmenting the frames into video object planes (VOPs), as used in MPEG-4, with each VOP corresponding to one semantically meaningful hand position. Next, the key VOPs are selected on the basis of the amount of change in hand shape – for a given key frame in the sequence the next key frame is the one in which the hand changes its shape significantly. Thus, an entire video clip is transformed into a small number of representative frames that are sufficient to represent a gesture sequence. Subsequently, we model a particular gesture as a sequence of key frames each bearing information about its duration. These constitute a finite state machine. For recognition, the states of the incoming gesture sequence are matched with the states of all different FSMs contained in the database of gesture vocabulary. The core idea of our proposed representation is that redundant frames of the gesture video sequence bear only the temporal information of a gesture and hence discarded for computational efficiency. Experimental results obtained demonstrate the effectiveness of our proposed scheme for key frame extraction, subsequent gesture summarization and finally gesture recognition.


Authors:



References:
[1] R.C. Rose, Discriminant Wordspotting Techniques for Rejection Non-
Vocabulary Utterances in Unconstrained Speech, Proceedings of the
IEEE International Conference on Acoustics, Speech, and Signal Processing,
San Francisco, vol.II (1992) 105-108.
[2] K. Takahashi, S. Seki, R. Oka, Spotting Recognition of Human Gestures
from Motion Images, Technical Report IE92-134, The Insttitute of
Electronics, Information, and Communication Engineers, Japan, (1992)
(in Japanese) 9-16.
[3] T. Baudel, M. Beaudouin-Lafon, CHARADE: Remote Control of Objects
Using Free-Hand Gestures, Communication ACM, 36 (7) (1993)
28-35.
[4] S.S. Fels, G.E. Hinton, Glove-Talk: A Neural Network Interface between
a Data-Glove and a Speech Synthesizer, IEEE Transaction on Neural
Networks, 4(1) (1993) 2-8.
[5] S.S. Fels, G.E. Hinton, Glove-Talk II: A Neural Network Interface which
Maps Gestures to Parallel Format Speech Synthesizer Controls, IEEE
Transaction on Neural Networks, 9(1) (1997) 205-212.
[6] D.L. Quam, Gesture Recognition With a Data Glove, Proceedings of the
IEEE Conference on National Aerospace and Electronics, vol. 2 (1990).
[7] D.J. Sturman, D. Zeltzer, A Survey of Glove-Based Input, IEEE Computer
Graphics and Applications, vol. 14 (1994) 30-39.
[8] W.W. Kong, S. Ranganath, 3-D Hand Trajectory Recognition for Signing
Exact English, Proceedings of the 6th IEEE International Conference on
Automatic Face and Gesture Recognition, (2004) 535-540.
[9] Ng.Y.Y. Kevin, S. Ranganath, D. Ghosh, Trajectory Modeling in Gesture
Recognition using Cybergloves and Magnetic trackers, Proceedings of
the IEEE TENCON 2004, Chiang Mai, Thailand, vol.A (2004) A.571-
A.574.
[10] J. Davis, M. Shah, Recognizing Hand Gestures, Proceedings of the
European Conference on Computer Vision, ECCV, (1994) 331-340.
[11] V.I. Pavlovic, R. Sharma, T.S. Huang, Visual Interpretation of Hand
Gestures for Human-Computer Interaction: A Review, IEEE Transaction
on Pattern Analysis and Machine Intelligence, 19 (7) (1997) 677-695.
[12] Y. Wu, T.S. Huang, Self-supervised Learning for Visual Tracking and
Recognition of Human Hand, Proceedings of 17th National Conference
on Artificial Intelligence, (AAAI-2000), (2000) 243-248.
[13] J. Zieren, N. Unger, S. Akyol, Hands Tracking from Frontal View for
Vision-Based Gesture Recognition, Proceedings of DAGM Symposium,
(2002) 531-539.
[14] Y. Wu, T.S. Huang, Hand Modelling, Analysis, and Recognition for
Vision-Based Human Computer Interaction, IEEE Signal Processing
Magazine, (2001) 51-60.
[15] J. Yamato, J. Ohya, K. Ishii, Recognizing Human Action in Time-
Sequential Images Using Hidden Markov Model, Proceeding of the
IEEE Conference on Computer Vision and Pattern recognition, (1992)
379-385.
[16] C. Vogler, D. Metaxas, ASL Recognition Based on a Coupling Between
HMM and 3D Motion Analysis, Proceedings of the International Conference
on Computer Vision, (1998) 363-369.
[17] A. Ramamoorthy, N. Vaswani, S. Chaudhury, S. Banerjee, Recognition
of Dynamic Hand Gestures, Pattern Recognition, 36 (2003) 2069-2081.
[18] M. Isard, A. Blake, Contour Tracking by Stochastic Propagation of Conditional
density, Proceedings of the European Conference on Computer
Vision, (1996) 343-356.
[19] M. Black, A Jepson, Recognition Temporal Trajectories using Condensation
Algorithm, Proceedings of the International Conference on
Automatic Face and Gesture Recognition, Japan, (1998) 16-21.
[20] J. Davis, M. Shah, Visual Gesture Recognition, Vision, Image and Signal
Processing, 141 (2) (1994) 101-106.
[21] A.F. Bobick, A.D. Wilson, A State-Based Approach to the Representation
and Recognition of Gesture, IEEE Transaction on Pattern Analysis
and Machine Intelligence, 19 (12) (1997) 1325-1337.
[22] P. Hong, M. Turk, T.S. Huang, Gesture Modelling and Recognition using
Finite State Machines, Proceeding of the IEEE Conference on Face and
Gesture Recognition, (2000) 410-415.
[23] T. Meier, K.N. Nagan, Automatic Segmentation of Moving Objects
for Video Object Plane Generation, IEEE Transaction on Circuits and
Systems for Video Technology, 8 (5) (1998) 525-538.
[24] D.P. Huttenlocher, G.A. Klanderman, W.J. Rucklidge, Comparing Images
using the Hausdorff Distance, IEEE Transaction on Pattern Analysis
and Machine Intelligence, 15 (9) (1993) 850-863.
[25] M.K. Bhuyan, D. Ghosh, P.K. Bora, Finite State Representation of
Hand Gestures using Key Video Object Plane, Proceedings of the IEEE
TENCON 2004, Chiang Mai, Thailand, vol.A (2004) A.579-A.582.
[26] M.K. Bhuyan, D. Ghosh, P.K. Bora, Key Video Object Plane Selection
by MPEG-7 Visual Shape Descriptor for Summarization and Recognition
of Hand Gestures, Proceedings of the 4th Indian Conference
on Computer Vision, Graphics and Image Processing (ICVGIP 2004),
Kolkata, India (2004) 638-643.
[27] B.S. Manjunath, P. Salembier, T. Sikora, ed., Intoduction to MPEG-7,
Multimedia Content Description Interface, John Wiley and Sons, Ltd,
(2002).
[28] Y. Cui, J.J. Weng, Hand Segmentation using Learning-Based Prediction
and Verification for Hand Sign Recognition, Proceedings of the IEEE
CS Conference on Computer Vision and Pattern Recognition, (1997)
88-93.
[29] R. Polana, R. Nelson, Low Level Recognition of Human Motion,
Proceeding of the IEEE CS Workshop on Motion of Non-Rigid and
Articulated Objects, Austin, (1994) 77-82.
[30] A.F. Bobick, J. Davis, Real-Time Recognition of Activity using Temporal
Templates, Proceedings of the IEEE CS Workshop on Applications
of Computer Vision, (1996) 39-42.
[31] C. Kim, J.N. Hwang, Object-Based Video Abstraction for Video Surveillance
Systems, IEEE Transaction on Circuits and Systems for Video
Technology, 12 (12) (2002) 1128-1138.
[32] B. Erol, F. Kossentini, Automatic Key Video Object Plane Selection
using the Shape Information in the MPEG-4 Compressed Domain, IEEE
Transaction on Multimedia, 2 (2) (2000) 129-138.
[33] G. Borgefors, Distance Transformations in Digital Images, Computer
Vision, Graphics and Image Processing, 34 (1986) 344-371.
[34] B. Erol, F. Kossentini, Local Motion Descriptors, Proceedings of the
IEEE 4th Workshop on Multimedia Signal Processing, (2001) 467-472.
[35] M. Bober, MPEG-7 Visual Shape Descriptors, IEEE Transaction on
Circuits and Systems for Video Technology, 11 (6) (2001) 716-719.