Investigating Breakdowns in Human Robot Interaction: A Conversation Analysis Guided Single Case Study of a Human-Robot Communication in a Museum Environment

In a single case study, we show how a conversation analysis (CA) approach can shed light onto the sequential unfolding of human-robot interaction. Relying on video data, we are able to show that CA allows us to investigate the respective turn-taking systems of humans and a NAO robot in their dialogical dynamics, thus pointing out relevant differences. Our fine grained video analysis points out occurring breakdowns and their overcoming, when humans and a NAO-robot engage in a multimodally uttered multi-party communication during a sports guessing game. Our findings suggest that interdisciplinary work opens up the opportunity to gain new insights into the challenging issues of human robot communication in order to provide resources for developing mechanisms that enable complex human-robot interaction (HRI).

The Video Database for Teaching and Learning in Football Refereeing

The following paper describes the video database tool used by the Fédération Internationale de Football Association (FIFA) as part of the research project developed in collaboration with the Carlos III University of Madrid. The database project began in 2012, with the aim of creating an educational tool for the training of instructors, referees and assistant referees, and it has been used in all FUTURO III courses since 2013. The platform now contains 3,135 video clips of different match situations from FIFA competitions. It has 1,835 users (FIFA instructors, referees and assistant referees). In this work, the main features of the database are described, such as the use of a search tool and the creation of multimedia presentations and video quizzes. The database has been developed in MySQL, ActionScript, Ruby on Rails and HTML. This tool has been rated by users as "very good" in all courses, which prompt us to introduce it as an ideal tool for any other sport that requires the use of video analysis.

A Video-Based Observation and Analysis Method to Assess Human Movement and Behaviour in Crowded Areas

Human movement in the real world provides important information for developing human behaviour models and simulations. However, it is difficult to assess ‘real’ human behaviour since there is no established method available. As part of the AUNTSUE (Accessibility and User Needs in Transport – Sustainable Urban Environments) project, this research aimed to propose a method to assess human movement and behaviour in crowded areas. The method is based on the three major steps of video recording, conceptual behavior modelling and video analysis. The focus is on individual human movement and behaviour in normal situations (panic situations are not considered) and the interactions between individuals in localized areas. Emphasis is placed on gaining knowledge of characteristics of human movement and behaviour in the real world that can be modelled in the virtual environment.

Object Tracking System Using Camshift, Meanshift and Kalman Filter

This paper presents a implementation of an object tracking system in a video sequence. This object tracking is an important task in many vision applications. The main steps in video analysis are two: detection of interesting moving objects and tracking of such objects from frame to frame. In a similar vein, most tracking algorithms use pre-specified methods for preprocessing. In our work, we have implemented several object tracking algorithms (Meanshift, Camshift, Kalman filter) with different preprocessing methods. Then, we have evaluated the performance of these algorithms for different video sequences. The obtained results have shown good performances according to the degree of applicability and evaluation criteria.

Spatio-Temporal Video Slice Edges Analysis for Shot Transition Detection and Classification

In this work we will present a new approach for shot transition auto-detection. Our approach is based on the analysis of Spatio-Temporal Video Slice (STVS) edges extracted from videos. The proposed approach is capable to efficiently detect both abrupt shot transitions 'cuts' and gradual ones such as fade-in, fade-out and dissolve. Compared to other techniques, our method is distinguished by its high level of precision and speed. Those performances are obtained due to minimizing the problem of the boundary shot detection to a simple 2D image partitioning problem.

Evaluation of Classifiers Based On I2C Distance for Action Recognition

Naive Bayes Nearest Neighbor (NBNN) and its variants, i,e., local NBNN and the NBNN kernels, are local feature-based classifiers that have achieved impressive performance in image classification. By exploiting instance-to-class (I2C) distances (instance means image/video in image/video classification), they avoid quantization errors of local image descriptors in the bag of words (BoW) model. However, the performances of NBNN, local NBNN and the NBNN kernels have not been validated on video analysis. In this paper, we introduce these three classifiers into human action recognition and conduct comprehensive experiments on the benchmark KTH and the realistic HMDB datasets. The results shows that those I2C based classifiers consistently outperform the SVM classifier with the BoW model.

Action Recognition in Video Sequences using a Mealy Machine

In this paper the use of sequential machines for recognizing actions taken by the objects detected by a general tracking algorithm is proposed. The system may deal with the uncertainty inherent in medium-level vision data. For this purpose, fuzzification of input data is performed. Besides, this transformation allows to manage data independently of the tracking application selected and enables adding characteristics of the analyzed scenario. The representation of actions by means of an automaton and the generation of the input symbols for finite automaton depending on the object and action compared are described. The output of the comparison process between an object and an action is a numerical value that represents the membership of the object to the action. This value is computed depending on how similar the object and the action are. The work concludes with the application of the proposed technique to identify the behavior of vehicles in road traffic scenes.

Scene Adaptive Shadow Detection Algorithm

Robustness is one of the primary performance criteria for an Intelligent Video Surveillance (IVS) system. One of the key factors in enhancing the robustness of dynamic video analysis is,providing accurate and reliable means for shadow detection. If left undetected, shadow pixels may result in incorrect object tracking and classification, as it tends to distort localization and measurement information. Most of the algorithms proposed in literature are computationally expensive; some to the extent of equalling computational requirement of motion detection. In this paper, the homogeneity property of shadows is explored in a novel way for shadow detection. An adaptive division image (which highlights homogeneity property of shadows) analysis followed by a relatively simpler projection histogram analysis for penumbra suppression is the key novelty in our approach.

Object Speed Estimation by using Fuzzy Set

Speed estimation is one of the important and practical tasks in machine vision, Robotic and Mechatronic. the availability of high quality and inexpensive video cameras, and the increasing need for automated video analysis has generated a great deal of interest in machine vision algorithms. Numerous approaches for speed estimation have been proposed. So classification and survey of the proposed methods can be very useful. The goal of this paper is first to review and verify these methods. Then we will propose a novel algorithm to estimate the speed of moving object by using fuzzy concept. There is a direct relation between motion blur parameters and object speed. In our new approach we will use Radon transform to find direction of blurred image, and Fuzzy sets to estimate motion blur length. The most benefit of this algorithm is its robustness and precision in noisy images. Our method was tested on many images with different range of SNR and is satisfiable.

Key Frames Extraction for Sign Language Video Analysis and Recognition

In this paper we proposed a method for finding video frames representing one sign in the finger alphabet. The method is based on determining hands location, segmentation and the use of standard video quality evaluation metrics. Metric calculation is performed only in regions of interest. Sliding mechanism for finding local extrema and adaptive threshold based on local averaging is used for key frames selection. The success rate is evaluated by recall, precision and F1 measure. The method effectiveness is compared with metrics applied to all frames. Proposed method is fast, effective and relatively easy to realize by simple input video preprocessing and subsequent use of tools designed for video quality measuring.