Using Different Aspects of the Signings for Appearance-based Sign Language Recognition

Sign language is used by the deaf and hard of hearing people for communication. Automatic sign language recognition is a challenging research area since sign language often is the only way of communication for the deaf people. Sign language includes different components of visual actions made by the signer using the hands, the face, and the torso, to convey his/her meaning. To use different aspects of signs, we combine the different groups of features which have been extracted from the image frames recorded directly by a stationary camera. We combine the features in two levels by employing three techniques. At the feature level, an early feature combination can be performed by concatenating and weighting different feature groups, or by concatenating feature groups over time and using LDA to choose the most discriminant elements. At the model level, a late fusion of differently trained models can be carried out by a log-linear model combination. In this paper, we investigate these three combination techniques in an automatic sign language recognition system and show that the recognition rate can be significantly improved.





References:
[1] A. Sixtus, S. Molau, S. Kanthak, R. Schl├╝ter, and H. Ney, "Recent
Improvements of the RWTH Large Vocabulary Speech Recognition
System on Spontaneous Speech," in Proc. Int. Conf. On Acoustics,
Speech and Signal Processing, Istanbul, Turkey, 2000, pp. 1671-1674.
[2] J. Lööf, M. Bisani, C. Gollan, G. Heigold, B. Hoffmeister, Ch. Plahl, R.
Schl├╝ter, and H. Ney, "The 2006 RWTH Parliamentary Speeches
Transcription System" in Proceedings of the 9th International
Conference on Spoken Language Processing (ICSLP 2006), vol. 2,
Pittsburgh, PA, 2006, pp. 105-108.
[3] C. Neidle, J. Kegl, D. MacLaughlin, B. Bahan, and R.G. Lee, The Syntax
of American Sign Language: Functional Categories and Hierarchical
Structure. Cambridge, MA: MIT Press, 2000.
[4] D. Keysers, T. Deselaers, and H. Ney, "Pixel-to-Pixel Matching for
Image Recognition using Hungarian Graph Matching," in DAGM 2004,
Pattern Recognition, 26th DAGM Symposium, 2004, Lecture Notes in
Computer Science, vol. 3175, T{\"u}bingen, Germany, pp. 154-162.
[5] D. Keysers, T. Deselaers, C. Gollan, and H. Ney, "Deformation Models
for Image Recognition" IEEE Trans. Pattern Analysis and Machine
Intelligence, 2007,vol. 29, pp.1422-1435.
[6] T. Deselaers, H. M├╝ller, P. Clogh, H. Ney, and T. M Lehmann, "The
CLEF 2005 Automatic Medical Image Annotation Task," in
International Journal of Computer Vision, 2007, vol. 74 , pp. 51-58.
[7] T. Deselaers, D. Keysers, and H. Ney, "FIRE - Flexible Image Retrieval
Engine: ImageCLEF 2004 Evaluation," in CLEF 2004, Bath, UK,
Lecture Notes in Computer Science, vol.3491 pp.688-698.
[8] T. Deselaers, D. Keysers, and H. Ney, "Discriminative Training for
Object Recognition using Image Patches," in IEEE Conference on
Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005,
vol. 2, pp. 157-162.
[9] T. Deselaers, A. Hegerath, D. Keysers, and H. Ney, "Sparse Patch-
Histograms for Object Classification in Cluttered Images," in DAGM
2006, Pattern Recognition, 28th DAGM Symposium, 2006, Lecture
Notes in Computer Science, vol. 4174, T├╝bingen, Germany, pp. 202-211.
[10] M. Zahedi, D. Keysers, and H. Ney, "Appearance-Based Recognition of
Words in American Sign Language," in Proceedings of IbPRIA 2005,
2nd Iberian Conference on Pattern Recognition and Image Analysis,
Lecture Notes in Computer Science, vol. 3522, Estoril, Portugal, pp.
511-519.
[11] M. Zahedi, D. Keysers, T. Deselaers, and H. Ney, "Combination of
Tangent Distance and an Image Distortion Model for Appearance-
Based Sign Language Recognition.," in Proceedings of DAGM 2005,
27th Annual meeting of the German Association for Pattern
Recognition, Lecture Notes in Computer Science, vol. 3663, Vienna,
Austria, pp. 401-408.
[12] M. Zahedi, P. Dreuw, D. Rybach, T. Deselaers, and H. Ney, "Using
Geometric Features to Improve Continuous Appearance-based Sign
Language Recognition," in Proceedings of BMVC 06, 17th British
Maschine Vision, Edinburgh,UK,2006, vol. 3, pp. 1019-1028.
[13] P. Dreuw, D. Rybach, T. Deselaers, M. Zahedi, and H. Ney. Speech
Recognition Techniques for a Sign Language Recognition System. In
Interspeech 2007, pages 2513-2516, Antwerp, Belgium, August, 2007.
ISCA best student paper award Interspeech 2007.
[14] S. Eickeler, A. Kosmala, and G. Rigoll, "Hidden Markov Model Based
Continuous Online Gesture Recognition," in Proceedings of Int.
Conference on Pattern Recognition (ICPR), Brisbane, 1998, pp. 1206-
1208.
[15] G. Rigoll, A. Kosmala, and S. Eickeler, "High Performance Real-time
Gesture Recognition Using Hidden Markov Models," in Proceedings of
Iternational Gesture Workshop 1998, Lecture Notes in Computer
Science, vol. 1371, Bielefeld, Germany, pp. 69-80.
[16] B. Bauer and H. Hienz, "Relevant Features for Video-based Continuous
Sign Language Recognition," in Proceedings of the 4th International
Conference Automatic Face and Gesture Recognition 2000, Grenoble,
France, pp. 440-445.
[17] P. Dreuw and T. Deselaers and D. Rybach and D. Keysers and H. Ney,
"Tracking Using Dynamic Programming for Appearance-Based Sign
Language Recognition," in Proceedings of the 7th International
Conference of Automatic Face and Gesture Recognition, IEEE,
Southampton, UK, 2006, pp. 293-298.
[18] J. R. R. Estes, and V. R. Algazi, "Efficient error free chain coding of
binary documents," in Proceedings of Data Compression Conference,
Snowbird, Utah, pp. 122-131.
[19] Darnell Moore and Irfan Essa, "Recognizing Multitasked Activities from
Video Using Stochastic Context-free Grammar," in Proceedings of 18th
national conference on Artificial Intelligence, Edmonton, Alberta,
Canada, 2002, pp. 770-776.
[20] C. Vogler and D. Metaxas, "Adapting Hidden Markov Models for ASL
Recognition by Using Three-dimentional Computer Vision Methods," in
Proceedings of the IEEE International Conference on Systems, Man and
Cybernetics, Orlando, FL, 1997,pp. 156-161.
[21] T. Starner, J. Weaver and A. Pentland, "Real-time American Sign
Language Recognition Using Desk and Wearable Computer Based
Video," in Transaction of Pattern Analysis and Machine Intelligence,
vol. 20(2), pp. 1371-1375.
[22] R. Bowden, D. Windridge, T. Kabir, A. Zisserman, and M. Bardy, "A
Linguaistic Feature Vector for the Visual Interpretation of Sign
Language," in Proceedings of ECCV 2004, the 8th European
Conference on Computer Vision, Prague, Czech Republic, 2004, pp.
391-401.
[23] A. Zolnay, R. Schl├╝ter, and H. Ney, "Acoustic Feature Combination for
Robust Speech Recognition," in Proceedings of ICASSP 2005, Int. Conf.
Acoustics, Speech, and Signal Processing, Philadelphia, PA, USA, 2005,
vol. 1, pp. 457-460.
[24] H. Haeb-Umbach, and H. Ney, "Linear discriminant analysis for
improved large vocabulary continuous speech recognition," in
Proceedings of ICASSP 1992, Int. Conf. Acoustics, Speech, and Signal
Processing,, 1992, pp. 13-16.
[25] P. Beyerlein, "Discriminative model combination," in IEEE Int. Conf. on
Acoustics, Speech and Signal Processing, Seattle, WA, 1998, pp. 481-
484.
[26] H. Tolba, A. Selouani, and D. O Shaughnessy, "Auditory-based acoustic
distinctive features and spectral cues for automatic speech recognition
using a multi-stream paradigm," in IEEE Int. Conf. on Acoustics,
Speech and Signal Processing, Orlando, FL , 2002, vol. 1, pp. 837-840.
[27] D. Keysers, and H. Ney, "Linear Discriminant Analysis and
Discriminative Log-linear Modeling," in Proceedings of ICPR 2004,
17th Int. Conf. on Pattern Recognition, Cambridge, UK, 2004,vol.1, pp.
156-159.