Evaluation of Classifiers Based On I2C Distance for Action Recognition

Naive Bayes Nearest Neighbor (NBNN) and its variants, i,e., local NBNN and the NBNN kernels, are local feature-based classifiers that have achieved impressive performance in image classification. By exploiting instance-to-class (I2C) distances (instance means image/video in image/video classification), they avoid quantization errors of local image descriptors in the bag of words (BoW) model. However, the performances of NBNN, local NBNN and the NBNN kernels have not been validated on video analysis. In this paper, we introduce these three classifiers into human action recognition and conduct comprehensive experiments on the benchmark KTH and the realistic HMDB datasets. The results shows that those I2C based classifiers consistently outperform the SVM classifier with the BoW model.





References:
[1] Oren Boiman, Eli Shechtman, Michal Irani. "In Defense of Nearest-Neighbor Based Image Classification". In CVPR 2008.
[2] T. Tuytelaars, M. Fritz, K. Saenko, T. Darrell. "The NBNN kernel". In ICCV 2011.
[3] S. Gao, I. Tsang, L. Chia, and P. Zhao. "Local features are not
lonely-Laplacian sparse coding for image classification". In CVPR, 2010.
[4] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. "Locality
-constrained linear coding for image classification". In CVPR, 2010.
[5] Lingqiao Liu, Lei Wang, Xinwang Liu. "In Defense of Soft-assignment
Coding". In ICCV 2011.
[6] Sancho McCann, David G. Lowe. "Local Naive Bayes nearest Neighbor
for Image Classification". Technical Report TR-2011-11, University of
British Columbia.
[7] C. Schuldt, I. Laptev, and B. Caputo, "Recognizing human actions: a local
SVM approach," in ICPR, 2004.
[8] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, T. Serre. "HMDB: A Large
Video Database for Human Motion Recognition". In ICCV 2011.
[9] David G. Lowe. "Distinctive Image Features from Scale-Invariant Keypoints". International Journal of Computer Vision 60(2), 91-110, 2004.
[10] I. Laptev, "On space-time interest points," IJCV, vol. 64, no. 2, pp.
107-123, 2005.
[11] Zhengxiang Wang, Yiqun Hu, Liang-Tien Chia. "Image-to-Class
Distance Metric Learning for Image Classification", In ECCV 2010.
[12] Heng Wang, Muhammad Muneeb Ullah, Alexander Kläser, Ivan Laptev,
Cordelia Schmid. "Evaluation of local spatio-temporal features for action
recognition". In BMVC 2009.
[13] Sadanand S., Corso J. "Action bank : a high level representation of
activity in video". In CVPR 2012.