Motion-Based Detection and Tracking of Multiple Pedestrians

Tracking of moving people has gained a matter of great importance due to rapid technological advancements in the field of computer vision. The objective of this study is to design a motion based detection and tracking multiple walking pedestrians randomly in different directions. In our proposed method, Gaussian mixture model (GMM) is used to determine moving persons in image sequences. It reacts to changes that take place in the scene like different illumination; moving objects start and stop often, etc. Background noise in the scene is eliminated through applying morphological operations and the motions of tracked people which is determined by using the Kalman filter. The Kalman filter is applied to predict the tracked location in each frame and to determine the likelihood of each detection. We used a benchmark data set for the evaluation based on a side wall stationary camera. The actual scenes from the data set are taken on a street including up to eight people in front of the camera in different two scenes, the duration is 53 and 35 seconds, respectively. In the case of walking pedestrians in close proximity, the proposed method has achieved the detection ratio of 87%, and the tracking ratio is 77 % successfully. When they are deferred from each other, the detection ratio is increased to 90% and the tracking ratio is also increased to 79%.

Hybrid Temporal Correlation Based on Gaussian Mixture Model Framework for View Synthesis

As 3D video is explored as a hot research topic in the last few decades, free-viewpoint TV (FTV) is no doubt a promising field for its better visual experience and incomparable interactivity. View synthesis is obviously a crucial technology for FTV; it enables to render images in unlimited numbers of virtual viewpoints with the information from limited numbers of reference view. In this paper, a novel hybrid synthesis framework is proposed and blending priority is explored. In contrast to the commonly used View Synthesis Reference Software (VSRS), the presented synthesis process is driven in consideration of the temporal correlation of image sequences. The temporal correlations will be exploited to produce fine synthesis results even near the foreground boundaries. As for the blending priority, this scheme proposed that one of the two reference views is selected to be the main reference view based on the distance between the reference views and virtual view, another view is chosen as the auxiliary viewpoint, just assist to fill the hole pixel with the help of background information. Significant improvement of the proposed approach over the state-of –the-art pixel-based virtual view synthesis method is presented, the results of the experiments show that subjective gains can be observed, and objective PSNR average gains range from 0.5 to 1.3 dB, while SSIM average gains range from 0.01 to 0.05.

Retrieving Extended High Dynamic Range from Digital Negative Image - An Experiment on Architectural Photo Imaging

The paper explores the development of an optimization of method and apparatus for retrieving extended high dynamic range from digital negative image. Architectural photo imaging can benefit from high dynamic range imaging (HDRI) technique for preserving and presenting sufficient luminance in the shadow and highlight clipping image areas. The HDRI technique that requires multiple exposure images as the source of HDRI rendering may not be effective in terms of time efficiency during the acquisition process and post-processing stage, considering it has numerous potential imaging variables and technical limitations during the multiple exposure process. This paper explores an experimental method and apparatus that aims to expand the dynamic range from digital negative image in HDRI environment. The method and apparatus explored is based on a single source of RAW image acquisition for the use of HDRI post-processing. It will cater the optimization in order to avoid and minimize the conventional HDRI photographic errors caused by different physical conditions during the photographing process and the misalignment of multiple exposed image sequences. The study observes the characteristics and capabilities of RAW image format as digital negative used for the retrieval of extended high dynamic range process in HDRI environment.

Facial Expressions Animation and Lip Tracking Using Facial Characteristic Points and Deformable Model

Face and facial expressions play essential roles in interpersonal communication. Most of the current works on the facial expression recognition attempt to recognize a small set of the prototypic expressions such as happy, surprise, anger, sad, disgust and fear. However the most of the human emotions are communicated by changes in one or two of discrete features. In this paper, we develop a facial expressions synthesis system, based on the facial characteristic points (FCP's) tracking in the frontal image sequences. Selected FCP's are automatically tracked using a crosscorrelation based optical flow. The proposed synthesis system uses a simple deformable facial features model with a few set of control points that can be tracked in original facial image sequences.

A Novel Multiresolution based Optimization Scheme for Robust Affine Parameter Estimation

This paper describes a new method for affine parameter estimation between image sequences. Usually, the parameter estimation techniques can be done by least squares in a quadratic way. However, this technique can be sensitive to the presence of outliers. Therefore, parameter estimation techniques for various image processing applications are robust enough to withstand the influence of outliers. Progressively, some robust estimation functions demanding non-quadratic and perhaps non-convex potentials adopted from statistics literature have been used for solving these. Addressing the optimization of the error function in a factual framework for finding a global optimal solution, the minimization can begin with the convex estimator at the coarser level and gradually introduce nonconvexity i.e., from soft to hard redescending non-convex estimators when the iteration reaches finer level of multiresolution pyramid. Comparison has been made to find the performance of the results of proposed method with the results found individually using two different estimators.

A Reliable FPGA-based Real-time Optical-flow Estimation

Optical flow is a research topic of interest for many years. It has, until recently, been largely inapplicable to real-time applications due to its computationally expensive nature. This paper presents a new reliable flow technique which is combined with a motion detection algorithm, from stationary camera image streams, to allow flow-based analyses of moving entities, such as rigidity, in real-time. The combination of the optical flow analysis with motion detection technique greatly reduces the expensive computation of flow vectors as compared with standard approaches, rendering the method to be applicable in real-time implementation. This paper describes also the hardware implementation of a proposed pipelined system to estimate the flow vectors from image sequences in real time. This design can process 768 x 576 images at a very high frame rate that reaches to 156 fps in a single low cost FPGA chip, which is adequate for most real-time vision applications.

Extracting Human Body based on Background Estimation in Modified HLS Color Space

The ability to recognize humans and their activities by computer vision is a very important task, with many potential application. Study of human motion analysis is related to several research areas of computer vision such as the motion capture, detection, tracking and segmentation of people. In this paper, we describe a segmentation method for extracting human body contour in modified HLS color space. To estimate a background, the modified HLS color space is proposed, and the background features are estimated by using the HLS color components. Here, the large amount of human dataset, which was collected from DV cameras, is pre-processed. The human body and its contour is successfully extracted from the image sequences.

A Robust Method for Hand Tracking Using Mean-shift Algorithm and Kalman Filter in Stereo Color Image Sequences

Real-time hand tracking is a challenging task in many computer vision applications such as gesture recognition. This paper proposes a robust method for hand tracking in a complex environment using Mean-shift analysis and Kalman filter in conjunction with 3D depth map. The depth information solve the overlapping problem between hands and face, which is obtained by passive stereo measuring based on cross correlation and the known calibration data of the cameras. Mean-shift analysis uses the gradient of Bhattacharyya coefficient as a similarity function to derive the candidate of the hand that is most similar to a given hand target model. And then, Kalman filter is used to estimate the position of the hand target. The results of hand tracking, tested on various video sequences, are robust to changes in shape as well as partial occlusion.

Real-time Tracking in Image Sequences based-on Parameters Updating with Temporal and Spatial Neighborhoods Mixture Gaussian Model

Gaussian mixture background model is widely used in moving target detection of the image sequences. However, traditional Gaussian mixture background model usually considers the time continuity of the pixels, and establishes background through statistical distribution of pixels without taking into account the pixels- spatial similarity, which will cause noise, imperfection and other problems. This paper proposes a new Gaussian mixture modeling approach, which combines the color and gradient of the spatial information, and integrates the spatial information of the pixel sequences to establish Gaussian mixture background. The experimental results show that the movement background can be extracted accurately and efficiently, and the algorithm is more robust, and can work in real time in tracking applications.

Efficient Mean Shift Clustering Using Exponential Integral Kernels

This paper presents a highly efficient algorithm for detecting and tracking humans and objects in video surveillance sequences. Mean shift clustering is applied on backgrounddifferenced image sequences. For efficiency, all calculations are performed on integral images. Novel corresponding exponential integral kernels are introduced to allow the application of nonuniform kernels for clustering, which dramatically increases robustness without giving up the efficiency of the integral data structures. Experimental results demonstrating the power of this approach are presented.

Extracting Tongue Shape Dynamics from Magnetic Resonance Image Sequences

An important problem in speech research is the automatic extraction of information about the shape and dimensions of the vocal tract during real-time speech production. We have previously developed Southampton dynamic magnetic resonance imaging (SDMRI) as an approach to the solution of this problem.However, the SDMRI images are very noisy so that shape extraction is a major challenge. In this paper, we address the problem of tongue shape extraction, which poses difficulties because this is a highly deforming non-parametric shape. We show that combining active shape models with the dynamic Hough transform allows the tongue shape to be reliably tracked in the image sequence.

A Stereo Vision System for Top View Book Scanners

This paper proposes a novel stereo vision technique for top view book scanners which provide us with dense 3d point clouds of page surfaces. This is a precondition to dewarp bound volumes independent of 2d information on the page. Our method is based on algorithms, which normally require the projection of pattern sequences with structured light. We use image sequences of the moving stripe lighting of the top view scanner instead of an additional light projection. Thus the stereo vision setup is simplified without losing measurement accuracy. Furthermore we improve a surface model dewarping method through introducing a difference vector based on real measurements. Although our proposed method is hardly expensive neither in calculation time nor in hardware requirements we present good dewarping results even for difficult examples.

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Arms detection is one of the fundamental problems in human motion analysis application. The arms are considered as the most challenging body part to be detected since its pose and speed varies in image sequences. Moreover, the arms are usually occluded with other body parts such as the head and torso. In this paper, histogram-based skin colour segmentation is proposed to detect the arms in image sequences. Six different colour spaces namely RGB, rgb, HSI, TSL, SCT and CIELAB are evaluated to determine the best colour space for this segmentation procedure. The evaluation is divided into three categories, which are single colour component, colour without luminance and colour with luminance. The performance is measured using True Positive (TP) and True Negative (TN) on 250 images with manual ground truth. The best colour is selected based on the highest TN value followed by the highest TP value.

Reversible Watermarking on Stereo Image Sequences

In this paper, a new reversible watermarking method is presented that reduces the size of a stereoscopic image sequence while keeping its content visible. The proposed technique embeds the residuals of the right frames to the corresponding frames of the left sequence, halving the total capacity. The residual frames may result in after a disparity compensated procedure between the two video streams or by a joint motion and disparity compensation. The residuals are usually lossy compressed before embedding because of the limited embedding capacity of the left frames. The watermarked frames are visible at a high quality and at any instant the stereoscopic video may be recovered by an inverse process. In fact, the left frames may be exactly recovered whereas the right ones are slightly distorted as the residuals are not embedded intact. The employed embedding method reorders the left frame into an array of consecutive pixel pairs and embeds a number of bits according to their intensity difference. In this way, it hides a number of bits in intensity smooth areas and most of the data in textured areas where resulting distortions are less visible. The experimental evaluation demonstrates that the proposed scheme is quite effective.

Effect of Scene Changing on Image Sequences Compression Using Zero Tree Coding

We study in this paper the effect of the scene changing on image sequences coding system using Embedded Zerotree Wavelet (EZW). The scene changing considered here is the full motion which may occurs. A special image sequence is generated where the scene changing occurs randomly. Two scenarios are considered: In the first scenario, the system must provide the reconstruction quality as best as possible by the management of the bit rate (BR) while the scene changing occurs. In the second scenario, the system must keep the bit rate as constant as possible by the management of the reconstruction quality. The first scenario may be motivated by the availability of a large band pass transmission channel where an increase of the bit rate may be possible to keep the reconstruction quality up to a given threshold. The second scenario may be concerned by the narrow band pass transmission channel where an increase of the bit rate is not possible. In this last case, applications for which the reconstruction quality is not a constraint may be considered. The simulations are performed with five scales wavelet decomposition using the 9/7-tap filter bank biorthogonal wavelet. The entropy coding is performed using a specific defined binary code book and EZW algorithm. Experimental results are presented and compared to LEAD H263 EVAL. It is shown that if the reconstruction quality is the constraint, the system increases the bit rate to obtain the required quality. In the case where the bit rate must be constant, the system is unable to provide the required quality if the scene change occurs; however, the system is able to improve the quality while the scene changing disappears.

Hand Gesture Recognition Based on Combined Features Extraction

Hand gesture is an active area of research in the vision community, mainly for the purpose of sign language recognition and Human Computer Interaction. In this paper, we propose a system to recognize alphabet characters (A-Z) and numbers (0-9) in real-time from stereo color image sequences using Hidden Markov Models (HMMs). Our system is based on three main stages; automatic segmentation and preprocessing of the hand regions, feature extraction and classification. In automatic segmentation and preprocessing stage, color and 3D depth map are used to detect hands where the hand trajectory will take place in further step using Mean-shift algorithm and Kalman filter. In the feature extraction stage, 3D combined features of location, orientation and velocity with respected to Cartesian systems are used. And then, k-means clustering is employed for HMMs codeword. The final stage so-called classification, Baum- Welch algorithm is used to do a full train for HMMs parameters. The gesture of alphabets and numbers is recognized using Left-Right Banded model in conjunction with Viterbi algorithm. Experimental results demonstrate that, our system can successfully recognize hand gestures with 98.33% recognition rate.

A Hidden Markov Model-Based Isolated and Meaningful Hand Gesture Recognition

Gesture recognition is a challenging task for extracting meaningful gesture from continuous hand motion. In this paper, we propose an automatic system that recognizes isolated gesture, in addition meaningful gesture from continuous hand motion for Arabic numbers from 0 to 9 in real-time based on Hidden Markov Models (HMM). In order to handle isolated gesture, HMM using Ergodic, Left-Right (LR) and Left-Right Banded (LRB) topologies is applied over the discrete vector feature that is extracted from stereo color image sequences. These topologies are considered to different number of states ranging from 3 to 10. A new system is developed to recognize the meaningful gesture based on zero-codeword detection with static velocity motion for continuous gesture. Therefore, the LRB topology in conjunction with Baum-Welch (BW) algorithm for training and forward algorithm with Viterbi path for testing presents the best performance. Experimental results show that the proposed system can successfully recognize isolated and meaningful gesture and achieve average rate recognition 98.6% and 94.29% respectively.

Adaptive Weighted Averaging Filter Using the Appropriate Number of Consecutive Frames

In this paper, we propose a novel adaptive spatiotemporal filter that utilizes image sequences in order to remove noise. The consecutive frames include: current, previous and next noisy frames. The filter proposed in this paper is based upon the weighted averaging pixels intensity and noise variance in image sequences. It utilizes the Appropriate Number of Consecutive Frames (ANCF) based on the noisy pixels intensity among the frames. The number of consecutive frames is adaptively calculated for each region in image and its value may change from one region to another region depending on the pixels intensity within the region. The weights are determined by a well-defined mathematical criterion, which is adaptive to the feature of spatiotemporal pixels of the consecutive frames. It is experimentally shown that the proposed filter can preserve image structures and edges under motion while suppressing noise, and thus can be effectively used in image sequences filtering. In addition, the AWA filter using ANCF is particularly well suited for filtering sequences that contain segments with abruptly changing scene content due to, for example, rapid zooming and changes in the view of the camera.

Noise Reduction in Image Sequences using an Effective Fuzzy Algorithm

In this paper, we propose a novel spatiotemporal fuzzy based algorithm for noise filtering of image sequences. Our proposed algorithm uses adaptive weights based on a triangular membership functions. In this algorithm median filter is used to suppress noise. Experimental results show when the images are corrupted by highdensity Salt and Pepper noise, our fuzzy based algorithm for noise filtering of image sequences, are much more effective in suppressing noise and preserving edges than the previously reported algorithms such as [1-7]. Indeed, assigned weights to noisy pixels are very adaptive so that they well make use of correlation of pixels. On the other hand, the motion estimation methods are erroneous and in highdensity noise they may degrade the filter performance. Therefore, our proposed fuzzy algorithm doesn-t need any estimation of motion trajectory. The proposed algorithm admissibly removes noise without having any knowledge of Salt and Pepper noise density.

Tracking Objects in Color Image Sequences: Application to Football Images

In this paper, we present a comparative study between two computer vision systems for objects recognition and tracking, these algorithms describe two different approach based on regions constituted by a set of pixels which parameterized objects in shot sequences. For the image segmentation and objects detection, the FCM technique is used, the overlapping between cluster's distribution is minimized by the use of suitable color space (other that the RGB one). The first technique takes into account a priori probabilities governing the computation of various clusters to track objects. A Parzen kernel method is described and allows identifying the players in each frame, we also show the importance of standard deviation value research of the Gaussian probability density function. Region matching is carried out by an algorithm that operates on the Mahalanobis distance between region descriptors in two subsequent frames and uses singular value decomposition to compute a set of correspondences satisfying both the principle of proximity and the principle of exclusion.