Abstract: Motion capture devices have been utilized in
producing several contents, such as movies and video games. However,
since motion capture devices are expensive and inconvenient to use,
motions segmented from captured data was recycled and synthesized
to utilize it in another contents, but the motions were generally
segmented by contents producers in manual. Therefore, automatic
motion segmentation is recently getting a lot of attentions. Previous
approaches are divided into on-line and off-line, where on-line
approaches segment motions based on similarities between
neighboring frames and off-line approaches segment motions by
capturing the global characteristics in feature space. In this paper, we
propose a graph-based high-level motion segmentation method. Since
high-level motions consist of several repeated frames within temporal
distances, we consider all similarities among all frames within the
temporal distance. This is achieved by constructing a graph, where
each vertex represents a frame and the edges between the frames are
weighted by their similarity. Then, normalized cuts algorithm is used
to partition the constructed graph into several sub-graphs by globally
finding minimum cuts. In the experiments, the results using the
proposed method showed better performance than PCA-based method
in on-line and GMM-based method in off-line, as the proposed method
globally segment motions from the graph constructed based
similarities between neighboring frames as well as similarities among
all frames within temporal distances.
Abstract: Human pose estimation can be executed using Active Shape Models. The existing techniques for applying to human-body research using Active Shape Models, such as human detection, primarily take the form of silhouette of human body. This technique is not able to estimate accurately for human pose to concern two arms and legs, as the silhouette of human body represents the shape as out of round. To solve this problem, we applied the human body model as stick-figure, “skeleton". The skeleton model of human body can give consideration to various shapes of human pose. To obtain effective estimation result, we applied background subtraction and deformed matching algorithm of primary Active Shape Models in the fitting process. The images which were used to make the model were 600 human bodies, and the model has 17 landmark points which indicate body junction and key features of human pose. The maximum iteration for the fitting process was 30 times and the execution time was less than .03 sec.
Abstract: Although lots of research work has been done for
human pose recognition, the view-point of cameras is still critical
problem of overall recognition system. In this paper, view-point
insensitive human pose recognition is proposed. The aims of the
proposed system are view-point insensitivity and real-time processing.
Recognition system consists of feature extraction module, neural
network and real-time feed forward calculation. First, histogram-based
method is used to extract feature from silhouette image and it is
suitable for represent the shape of human pose. To reduce the
dimension of feature vector, Principle Component Analysis(PCA) is
used. Second, real-time processing is implemented by using Compute
Unified Device Architecture(CUDA) and this architecture improves
the speed of feed-forward calculation of neural network. We
demonstrate the effectiveness of our approach with experiments on
real environment.
Abstract: This paper proposes view-point insensitive human
pose recognition system using neural network. Recognition system
consists of silhouette image capturing module, data driven database,
and neural network. The advantages of our system are first, it is
possible to capture multiple view-point silhouette images of 3D human
model automatically. This automatic capture module is helpful to
reduce time consuming task of database construction. Second, we
develop huge feature database to offer view-point insensitivity at pose
recognition. Third, we use neural network to recognize human pose
from multiple-view because every pose from each model have similar
feature patterns, even though each model has different appearance and
view-point. To construct database, we need to create 3D human model
using 3D manipulate tools. Contour shape is used to convert silhouette
image to feature vector of 12 degree. This extraction task is processed
semi-automatically, which benefits in that capturing images and
converting to silhouette images from the real capturing environment is
needless. We demonstrate the effectiveness of our approach with
experiments on virtual environment.
Abstract: There are many researches to detect collision between real object and virtual object in 3D space. In general, these techniques are need to huge computing power. So, many research and study are constructed by using cloud computing, network computing, and distribute computing. As a reason of these, this paper proposed a novel fast 3D collision detection algorithm between real and virtual object using 2D intersection area. Proposed algorithm uses 4 multiple cameras and coarse-and-fine method to improve accuracy and speed performance of collision detection. In the coarse step, this system examines the intersection area between real and virtual object silhouettes from all camera views. The result of this step is the index of virtual sensors which has a possibility of collision in 3D space. To decide collision accurately, at the fine step, this system examines the collision detection in 3D space by using the visual hull algorithm. Performance of the algorithm is verified by comparing with existing algorithm. We believe proposed algorithm help many other research, study and application fields such as HCI, augmented reality, intelligent space, and so on.
Abstract: Finger spelling is an art of communicating by signs
made with fingers, and has been introduced into sign language to serve
as a bridge between the sign language and the verbal language.
Previous approaches to finger spelling recognition are classified into
two categories: glove-based and vision-based approaches. The
glove-based approach is simpler and more accurate recognizing work
of hand posture than vision-based, yet the interfaces require the user to
wear a cumbersome and carry a load of cables that connected the
device to a computer. In contrast, the vision-based approaches provide
an attractive alternative to the cumbersome interface, and promise
more natural and unobtrusive human-computer interaction. The
vision-based approaches generally consist of two steps: hand
extraction and recognition, and two steps are processed independently.
This paper proposes real-time vision-based Korean finger spelling
recognition system by integrating hand extraction into recognition.
First, we tentatively detect a hand region using CAMShift algorithm.
Then fill factor and aspect ratio estimated by width and height
estimated by CAMShift are used to choose candidate from database,
which can reduce the number of matching in recognition step. To
recognize the finger spelling, we use DTW(dynamic time warping)
based on modified chain codes, to be robust to scale and orientation
variations. In this procedure, since accurate hand regions, without
holes and noises, should be extracted to improve the precision, we use
graph cuts algorithm that globally minimize the energy function
elegantly expressed by Markov random fields (MRFs). In the
experiments, the computational times are less than 130ms, and the
times are not related to the number of templates of finger spellings in
database, as candidate templates are selected in extraction step.
Abstract: In this paper, we propose an improved 3D star skeleton
technique, which is a suitable skeletonization for human posture representation
and reflects the 3D information of human posture.
Moreover, the proposed technique is simple and then can be performed
in real-time. The existing skeleton construction techniques, such as
distance transformation, Voronoi diagram, and thinning, focus on the
precision of skeleton information. Therefore, those techniques are not
applicable to real-time posture recognition since they are computationally
expensive and highly susceptible to noise of boundary. Although
a 2D star skeleton was proposed to complement these problems,
it also has some limitations to describe the 3D information of the
posture. To represent human posture effectively, the constructed skeleton
should consider the 3D information of posture. The proposed 3D
star skeleton contains 3D data of human, and focuses on human action
and posture recognition. Our 3D star skeleton uses the 8 projection
maps which have 2D silhouette information and depth data of human
surface. And the extremal points can be extracted as the features of 3D
star skeleton, without searching whole boundary of object. Therefore,
on execution time, our 3D star skeleton is faster than the “greedy" 3D
star skeleton using the whole boundary points on the surface. Moreover,
our method can offer more accurate skeleton of posture than the
existing star skeleton since the 3D data for the object is concerned.
Additionally, we make a codebook, a collection of representative 3D
star skeletons about 7 postures, to recognize what posture of constructed
skeleton is.
Abstract: For the communication between human and computer
in an interactive computing environment, the gesture recognition is
studied vigorously. Therefore, a lot of studies have proposed efficient
methods about the recognition algorithm using 2D camera captured
images. However, there is a limitation to these methods, such as the
extracted features cannot fully represent the object in real world.
Although many studies used 3D features instead of 2D features for
more accurate gesture recognition, the problem, such as the processing
time to generate 3D objects, is still unsolved in related researches.
Therefore we propose a method to extract the 3D features combined
with the 3D object reconstruction. This method uses the modified
GPU-based visual hull generation algorithm which disables unnecessary
processes, such as the texture calculation to generate three kinds
of 3D projection maps as the 3D feature: a nearest boundary, a farthest
boundary, and a thickness of the object projected on the base-plane. In
the section of experimental results, we present results of proposed
method on eight human postures: T shape, both hands up, right hand
up, left hand up, hands front, stand, sit and bend, and compare the
computational time of the proposed method with that of the previous
methods.