Abstract: Tracking of moving people has gained a matter of great importance due to rapid technological advancements in the field of computer vision. The objective of this study is to design a motion based detection and tracking multiple walking pedestrians randomly in different directions. In our proposed method, Gaussian mixture model (GMM) is used to determine moving persons in image sequences. It reacts to changes that take place in the scene like different illumination; moving objects start and stop often, etc. Background noise in the scene is eliminated through applying morphological operations and the motions of tracked people which is determined by using the Kalman filter. The Kalman filter is applied to predict the tracked location in each frame and to determine the likelihood of each detection. We used a benchmark data set for the evaluation based on a side wall stationary camera. The actual scenes from the data set are taken on a street including up to eight people in front of the camera in different two scenes, the duration is 53 and 35 seconds, respectively. In the case of walking pedestrians in close proximity, the proposed method has achieved the detection ratio of 87%, and the tracking ratio is 77 % successfully. When they are deferred from each other, the detection ratio is increased to 90% and the tracking ratio is also increased to 79%.
Abstract: As 3D video is explored as a hot research topic in the last few decades, free-viewpoint TV (FTV) is no doubt a promising field for its better visual experience and incomparable interactivity. View synthesis is obviously a crucial technology for FTV; it enables to render images in unlimited numbers of virtual viewpoints with the information from limited numbers of reference view. In this paper, a novel hybrid synthesis framework is proposed and blending priority is explored. In contrast to the commonly used View Synthesis Reference Software (VSRS), the presented synthesis process is driven in consideration of the temporal correlation of image sequences. The temporal correlations will be exploited to produce fine synthesis results even near the foreground boundaries. As for the blending priority, this scheme proposed that one of the two reference views is selected to be the main reference view based on the distance between the reference views and virtual view, another view is chosen as the auxiliary viewpoint, just assist to fill the hole pixel with the help of background information. Significant improvement of the proposed approach over the state-of –the-art pixel-based virtual view synthesis method is presented, the results of the experiments show that subjective gains can be observed, and objective PSNR average gains range from 0.5 to 1.3 dB, while SSIM average gains range from 0.01 to 0.05.
Abstract: The paper explores the development of an optimization of method and apparatus for retrieving extended high dynamic range from digital negative image. Architectural photo imaging can benefit from high dynamic range imaging (HDRI) technique for preserving and presenting sufficient luminance in the shadow and highlight clipping image areas. The HDRI technique that requires multiple exposure images as the source of HDRI rendering may not be effective in terms of time efficiency during the acquisition process and post-processing stage, considering it has numerous potential imaging variables and technical limitations during the multiple exposure process. This paper explores an experimental method and apparatus that aims to expand the dynamic range from digital negative image in HDRI environment. The method and apparatus explored is based on a single source of RAW image acquisition for the use of HDRI post-processing. It will cater the optimization in order to avoid and minimize the conventional HDRI photographic errors caused by different physical conditions during the photographing process and the misalignment of multiple exposed image sequences. The study observes the characteristics and capabilities of RAW image format as digital negative used for the retrieval of extended high dynamic range process in HDRI environment.
Abstract: Face and facial expressions play essential roles in
interpersonal communication. Most of the current works on the facial
expression recognition attempt to recognize a small set of the
prototypic expressions such as happy, surprise, anger, sad, disgust
and fear. However the most of the human emotions are
communicated by changes in one or two of discrete features. In this
paper, we develop a facial expressions synthesis system, based on the
facial characteristic points (FCP's) tracking in the frontal image
sequences. Selected FCP's are automatically tracked using a crosscorrelation
based optical flow. The proposed synthesis system uses a
simple deformable facial features model with a few set of control
points that can be tracked in original facial image sequences.
Abstract: This paper describes a new method for affine parameter
estimation between image sequences. Usually, the parameter
estimation techniques can be done by least squares in a quadratic
way. However, this technique can be sensitive to the presence
of outliers. Therefore, parameter estimation techniques for various
image processing applications are robust enough to withstand the
influence of outliers. Progressively, some robust estimation functions
demanding non-quadratic and perhaps non-convex potentials adopted
from statistics literature have been used for solving these. Addressing
the optimization of the error function in a factual framework for
finding a global optimal solution, the minimization can begin with
the convex estimator at the coarser level and gradually introduce nonconvexity
i.e., from soft to hard redescending non-convex estimators
when the iteration reaches finer level of multiresolution pyramid.
Comparison has been made to find the performance of the results
of proposed method with the results found individually using two
different estimators.
Abstract: Optical flow is a research topic of interest for many
years. It has, until recently, been largely inapplicable to real-time
applications due to its computationally expensive nature. This paper
presents a new reliable flow technique which is combined with a
motion detection algorithm, from stationary camera image streams,
to allow flow-based analyses of moving entities, such as rigidity, in
real-time. The combination of the optical flow analysis with motion
detection technique greatly reduces the expensive computation of
flow vectors as compared with standard approaches, rendering the
method to be applicable in real-time implementation. This paper
describes also the hardware implementation of a proposed pipelined
system to estimate the flow vectors from image sequences in real
time. This design can process 768 x 576 images at a very high frame
rate that reaches to 156 fps in a single low cost FPGA chip, which is
adequate for most real-time vision applications.
Abstract: The ability to recognize humans and their activities by computer vision is a very important task, with many potential application. Study of human motion analysis is related to several research areas of computer vision such as the motion capture, detection, tracking and segmentation of people. In this paper, we describe a segmentation method for extracting human body contour in modified HLS color space. To estimate a background, the modified HLS color space is proposed, and the background features are estimated by using the HLS color components. Here, the large amount of human dataset, which was collected from DV cameras, is pre-processed. The human body and its contour is successfully extracted from the image sequences.
Abstract: Real-time hand tracking is a challenging task in many
computer vision applications such as gesture recognition. This paper
proposes a robust method for hand tracking in a complex environment
using Mean-shift analysis and Kalman filter in conjunction with 3D
depth map. The depth information solve the overlapping problem
between hands and face, which is obtained by passive stereo measuring
based on cross correlation and the known calibration data of
the cameras. Mean-shift analysis uses the gradient of Bhattacharyya
coefficient as a similarity function to derive the candidate of the hand
that is most similar to a given hand target model. And then, Kalman
filter is used to estimate the position of the hand target. The results
of hand tracking, tested on various video sequences, are robust to
changes in shape as well as partial occlusion.
Abstract: Gaussian mixture background model is widely used in
moving target detection of the image sequences. However, traditional
Gaussian mixture background model usually considers the time
continuity of the pixels, and establishes background through statistical
distribution of pixels without taking into account the pixels- spatial
similarity, which will cause noise, imperfection and other problems.
This paper proposes a new Gaussian mixture modeling approach,
which combines the color and gradient of the spatial information, and
integrates the spatial information of the pixel sequences to establish
Gaussian mixture background. The experimental results show that the
movement background can be extracted accurately and efficiently, and
the algorithm is more robust, and can work in real time in tracking
applications.
Abstract: This paper presents a highly efficient algorithm for detecting and tracking humans and objects in video surveillance sequences. Mean shift clustering is applied on backgrounddifferenced image sequences. For efficiency, all calculations are performed on integral images. Novel corresponding exponential integral kernels are introduced to allow the application of nonuniform kernels for clustering, which dramatically increases robustness without giving up the efficiency of the integral data structures. Experimental results demonstrating the power of this approach are presented.
Abstract: An important problem in speech research is the automatic extraction of information about the shape and dimensions of the vocal tract during real-time speech production. We have previously developed Southampton dynamic magnetic resonance imaging (SDMRI) as an approach to the solution of this problem.However, the SDMRI images are very noisy so that shape extraction is a major challenge. In this paper, we address the problem of tongue shape extraction, which poses difficulties because this is a highly deforming non-parametric shape. We show that combining active shape models with the dynamic Hough transform allows the tongue shape to be reliably tracked in the image sequence.
Abstract: This paper proposes a novel stereo vision technique
for top view book scanners which provide us with dense 3d point
clouds of page surfaces. This is a precondition to dewarp bound
volumes independent of 2d information on the page. Our method is
based on algorithms, which normally require the projection of pattern
sequences with structured light. We use image sequences of the
moving stripe lighting of the top view scanner instead of an additional
light projection. Thus the stereo vision setup is simplified without
losing measurement accuracy. Furthermore we improve a surface
model dewarping method through introducing a difference vector
based on real measurements. Although our proposed method is hardly
expensive neither in calculation time nor in hardware requirements
we present good dewarping results even for difficult examples.
Abstract: Arms detection is one of the fundamental problems in
human motion analysis application. The arms are considered as the
most challenging body part to be detected since its pose and speed
varies in image sequences. Moreover, the arms are usually occluded
with other body parts such as the head and torso. In this paper,
histogram-based skin colour segmentation is proposed to detect the
arms in image sequences. Six different colour spaces namely RGB,
rgb, HSI, TSL, SCT and CIELAB are evaluated to determine the best
colour space for this segmentation procedure. The evaluation is
divided into three categories, which are single colour component,
colour without luminance and colour with luminance. The
performance is measured using True Positive (TP) and True Negative
(TN) on 250 images with manual ground truth. The best colour is
selected based on the highest TN value followed by the highest TP
value.
Abstract: In this paper, a new reversible watermarking method is presented that reduces the size of a stereoscopic image sequence while keeping its content visible. The proposed technique embeds the residuals of the right frames to the corresponding frames of the left sequence, halving the total capacity. The residual frames may result in after a disparity compensated procedure between the two video streams or by a joint motion and disparity compensation. The residuals are usually lossy compressed before embedding because of the limited embedding capacity of the left frames. The watermarked frames are visible at a high quality and at any instant the stereoscopic video may be recovered by an inverse process. In fact, the left frames may be exactly recovered whereas the right ones are slightly distorted as the residuals are not embedded intact. The employed embedding method reorders the left frame into an array of consecutive pixel pairs and embeds a number of bits according to their intensity difference. In this way, it hides a number of bits in intensity smooth areas and most of the data in textured areas where resulting distortions are less visible. The experimental evaluation demonstrates that the proposed scheme is quite effective.
Abstract: We study in this paper the effect of the scene
changing on image sequences coding system using Embedded
Zerotree Wavelet (EZW). The scene changing considered here is the
full motion which may occurs. A special image sequence is generated
where the scene changing occurs randomly. Two scenarios are
considered: In the first scenario, the system must provide the
reconstruction quality as best as possible by the management of the
bit rate (BR) while the scene changing occurs. In the second scenario,
the system must keep the bit rate as constant as possible by the
management of the reconstruction quality. The first scenario may be
motivated by the availability of a large band pass transmission
channel where an increase of the bit rate may be possible to keep the
reconstruction quality up to a given threshold. The second scenario
may be concerned by the narrow band pass transmission channel
where an increase of the bit rate is not possible. In this last case,
applications for which the reconstruction quality is not a constraint
may be considered. The simulations are performed with five scales
wavelet decomposition using the 9/7-tap filter bank biorthogonal
wavelet. The entropy coding is performed using a specific defined
binary code book and EZW algorithm. Experimental results are
presented and compared to LEAD H263 EVAL. It is shown that if
the reconstruction quality is the constraint, the system increases the
bit rate to obtain the required quality. In the case where the bit rate
must be constant, the system is unable to provide the required quality
if the scene change occurs; however, the system is able to improve
the quality while the scene changing disappears.
Abstract: Hand gesture is an active area of research in the vision
community, mainly for the purpose of sign language recognition and
Human Computer Interaction. In this paper, we propose a system to
recognize alphabet characters (A-Z) and numbers (0-9) in real-time
from stereo color image sequences using Hidden Markov Models
(HMMs). Our system is based on three main stages; automatic segmentation
and preprocessing of the hand regions, feature extraction
and classification. In automatic segmentation and preprocessing stage,
color and 3D depth map are used to detect hands where the hand
trajectory will take place in further step using Mean-shift algorithm
and Kalman filter. In the feature extraction stage, 3D combined features
of location, orientation and velocity with respected to Cartesian
systems are used. And then, k-means clustering is employed for
HMMs codeword. The final stage so-called classification, Baum-
Welch algorithm is used to do a full train for HMMs parameters.
The gesture of alphabets and numbers is recognized using Left-Right
Banded model in conjunction with Viterbi algorithm. Experimental
results demonstrate that, our system can successfully recognize hand
gestures with 98.33% recognition rate.
Abstract: Gesture recognition is a challenging task for extracting
meaningful gesture from continuous hand motion. In this paper, we propose an automatic system that recognizes isolated gesture,
in addition meaningful gesture from continuous hand motion for Arabic numbers from 0 to 9 in real-time based on Hidden Markov Models (HMM). In order to handle isolated gesture, HMM using
Ergodic, Left-Right (LR) and Left-Right Banded (LRB) topologies is applied over the discrete vector feature that is extracted from stereo
color image sequences. These topologies are considered to different
number of states ranging from 3 to 10. A new system is developed to recognize the meaningful gesture based on zero-codeword detection
with static velocity motion for continuous gesture. Therefore, the
LRB topology in conjunction with Baum-Welch (BW) algorithm for
training and forward algorithm with Viterbi path for testing presents the best performance. Experimental results show that the proposed system can successfully recognize isolated and meaningful gesture and achieve average rate recognition 98.6% and 94.29% respectively.
Abstract: In this paper, we propose a novel adaptive
spatiotemporal filter that utilizes image sequences in order to remove
noise. The consecutive frames include: current, previous and next
noisy frames. The filter proposed in this paper is based upon the
weighted averaging pixels intensity and noise variance in image
sequences. It utilizes the Appropriate Number of Consecutive Frames
(ANCF) based on the noisy pixels intensity among the frames. The
number of consecutive frames is adaptively calculated for each
region in image and its value may change from one region to another
region depending on the pixels intensity within the region. The
weights are determined by a well-defined mathematical criterion,
which is adaptive to the feature of spatiotemporal pixels of the
consecutive frames. It is experimentally shown that the proposed
filter can preserve image structures and edges under motion while
suppressing noise, and thus can be effectively used in image
sequences filtering. In addition, the AWA filter using ANCF is
particularly well suited for filtering sequences that contain segments
with abruptly changing scene content due to, for example, rapid
zooming and changes in the view of the camera.
Abstract: In this paper, we propose a novel spatiotemporal fuzzy
based algorithm for noise filtering of image sequences. Our proposed algorithm uses adaptive weights based on a triangular membership
functions. In this algorithm median filter is used to suppress noise.
Experimental results show when the images are corrupted by highdensity
Salt and Pepper noise, our fuzzy based algorithm for noise filtering of image sequences, are much more effective in suppressing
noise and preserving edges than the previously reported algorithms such as [1-7]. Indeed, assigned weights to noisy pixels are very
adaptive so that they well make use of correlation of pixels. On the other hand, the motion estimation methods are erroneous and in highdensity noise they may degrade the filter performance. Therefore, our
proposed fuzzy algorithm doesn-t need any estimation of motion trajectory. The proposed algorithm admissibly removes noise without having any knowledge of Salt and Pepper noise density.
Abstract: In this paper, we present a comparative study between two computer vision systems for objects recognition and tracking, these algorithms describe two different approach based on regions constituted by a set of pixels which parameterized objects in shot sequences. For the image segmentation and objects detection, the FCM technique is used, the overlapping between cluster's distribution is minimized by the use of suitable color space (other that the RGB one). The first technique takes into account a priori probabilities governing the computation of various clusters to track objects. A Parzen kernel method is described and allows identifying the players in each frame, we also show the importance of standard deviation value research of the Gaussian probability density function. Region matching is carried out by an algorithm that operates on the Mahalanobis distance between region descriptors in two subsequent frames and uses singular value decomposition to compute a set of correspondences satisfying both the principle of proximity and the principle of exclusion.