Vision Based People Tracking System

In this paper we present the design and the implementation of a target tracking system where the target is set to be a moving person in a video sequence. The system can be applied easily as a vision system for mobile robot. The system is composed of two major parts the first is the detection of the person in the video frame using the SVM learning machine based on the “HOG” descriptors. The second part is the tracking of a moving person it’s done by using a combination of the Kalman filter and a modified version of the Camshift tracking algorithm by adding the target motion feature to the color feature, the experimental results had shown that the new algorithm had overcame the traditional Camshift algorithm in robustness and in case of occlusion.

Adversarial Disentanglement Using Latent Classifier for Pose-Independent Representation

The large pose discrepancy is one of the critical challenges in face recognition during video surveillance. Due to the entanglement of pose attributes with identity information, the conventional approaches for pose-independent representation lack in providing quality results in recognizing largely posed faces. In this paper, we propose a practical approach to disentangle the pose attribute from the identity information followed by synthesis of a face using a classifier network in latent space. The proposed approach employs a modified generative adversarial network framework consisting of an encoder-decoder structure embedded with a classifier in manifold space for carrying out factorization on the latent encoding. It can be further generalized to other face and non-face attributes for real-life video frames containing faces with significant attribute variations. Experimental results and comparison with state of the art in the field prove that the learned representation of the proposed approach synthesizes more compelling perceptual images through a combination of adversarial and classification losses.

The Impact of Temporal Impairment on Quality of Experience (QoE) in Video Streaming: A No Reference (NR) Subjective and Objective Study

Live video streaming is one of the most widely used service among end users, yet it is a big challenge for the network operators in terms of quality. The only way to provide excellent Quality of Experience (QoE) to the end users is continuous monitoring of live video streaming. For this purpose, there are several objective algorithms available that monitor the quality of the video in a live stream. Subjective tests play a very important role in fine tuning the results of objective algorithms. As human perception is considered to be the most reliable source for assessing the quality of a video stream subjective tests are conducted in order to develop more reliable objective algorithms. Temporal impairments in a live video stream can have a negative impact on the end users. In this paper we have conducted subjective evaluation tests on a set of video sequences containing temporal impairment known as frame freezing. Frame Freezing is considered as a transmission error as well as a hardware error which can result in loss of video frames on the reception side of a transmission system. In our subjective tests, we have performed tests on videos that contain a single freezing event and also for videos that contain multiple freezing events. We have recorded our subjective test results for all the videos in order to give a comparison on the available No Reference (NR) objective algorithms. Finally, we have shown the performance of no reference algorithms used for objective evaluation of videos and suggested the algorithm that works better. The outcome of this study shows the importance of QoE and its effect on human perception. The results for the subjective evaluation can serve the purpose for validating objective algorithms.

A Four-Step Ortho-Rectification Procedure for Geo-Referencing Video Streams from a Low-Cost UAV

In this paper, we present a four-step ortho-rectification procedure for real-time geo-referencing of video data from a low-cost UAV equipped with a multi-sensor system. The basic procedures for the real-time ortho-rectification are: (1) decompilation of the video stream into individual frames; (2) establishing the interior camera orientation parameters; (3) determining the relative orientation parameters for each video frame with respect to each other; (4) finding the absolute orientation parameters, using a self-calibration bundle and adjustment with the aid of a mathematical model. Each ortho-rectified video frame is then mosaicked together to produce a mosaic image of the test area, which is then merged with a well referenced existing digital map for the purpose of geo-referencing and aerial surveillance. A test field located in Abuja, Nigeria was used to evaluate our method. Video and telemetry data were collected for about fifteen minutes, and they were processed using the four-step ortho-rectification procedure. The results demonstrated that the geometric measurement of the control field from ortho-images is more accurate when compared with those from original perspective images when used to pin point the exact location of targets on the video imagery acquired by the UAV. The 2-D planimetric accuracy when compared with the 6 control points measured by a GPS receiver is between 3 to 5 metres.

Video Shot Detection and Key Frame Extraction Using Faber Shauder DWT and SVD

Key frame extraction methods select the most representative frames of a video, which can be used in different areas of video processing such as video retrieval, video summary, and video indexing. In this paper we present a novel approach for extracting key frames from video sequences. The frame is characterized uniquely by his contours which are represented by the dominant blocks. These dominant blocks are located on the contours and its near textures. When the video frames have a noticeable changement, its dominant blocks changed, then we can extracte a key frame. The dominant blocks of every frame is computed, and then feature vectors are extracted from the dominant blocks image of each frame and arranged in a feature matrix. Singular Value Decomposition is used to calculate sliding windows ranks of those matrices. Finally the computed ranks are traced and then we are able to extract key frames of a video. Experimental results show that the proposed approach is robust against a large range of digital effects used during shot transition.

Face Recognition Based On Vector Quantization Using Fuzzy Neuro Clustering

A face recognition system is a computer application for automatically identifying or verifying a person from a digital image or a video frame. A lot of algorithms have been proposed for face recognition. Vector Quantization (VQ) based face recognition is a novel approach for face recognition. Here a new codebook generation for VQ based face recognition using Integrated Adaptive Fuzzy Clustering (IAFC) is proposed. IAFC is a fuzzy neural network which incorporates a fuzzy learning rule into a competitive neural network. The performance of proposed algorithm is demonstrated by using publicly available AT&T database, Yale database, Indian Face database and a small face database, DCSKU database created in our lab. In all the databases the proposed approach got a higher recognition rate than most of the existing methods. In terms of Equal Error Rate (ERR) also the proposed codebook is better than the existing methods.

Video-Based Face Recognition Based On State-Space Model

This paper proposes a video-based framework for face recognition to identify which faces appear in a video sequence. Our basic idea is like a tracking task - to track a selection of person candidates over time according to the observing visual features of face images in video frames. Hence, we employ the state-space model to formulate video-based face recognition by dividing this problem into two parts: the likelihood and the transition measures. The likelihood measure is to recognize whose face is currently being observed in video frames, for which two-dimensional linear discriminant analysis is employed. The transition measure estimates the probability of changing from an incorrect recognition at the previous stage to the correct person at the current stage. Moreover, extra nodes associated with head nodes are incorporated into our proposed state-space model. The experimental results are also provided to demonstrate the robustness and efficiency of our proposed approach.

Novel Security Strategy for Real Time Digital Videos

Now a days video data embedding approach is a very challenging and interesting task towards keeping real time video data secure. We can implement and use this technique with high-level applications. As the rate-distortion of any image is not confirmed, because the gain provided by accurate image frame segmentation are balanced by the inefficiency of coding objects of arbitrary shape, with a lot factors like losses that depend on both the coding scheme and the object structure. By using rate controller in association with the encoder one can dynamically adjust the target bitrate. This paper discusses about to keep secure videos by mixing signature data with negligible distortion in the original video, and to keep steganographic video as closely as possible to the quality of the original video. In this discussion we propose the method for embedding the signature data into separate video frames by the use of block Discrete Cosine Transform. These frames are then encoded by real time encoding H.264 scheme concepts. After processing, at receiver end recovery of original video and the signature data is proposed.

Comparative Evaluation of Color-Based Video Signatures in the Presence of Various Distortion Types

The robustness of color-based signatures in the presence of a selection of representative distortions is investigated. Considered are five signatures that have been developed and evaluated within a new modular framework. Two signatures presented in this work are directly derived from histograms gathered from video frames. The other three signatures are based on temporal information by computing difference histograms between adjacent frames. In order to obtain objective and reproducible results, the evaluations are conducted based on several randomly assembled test sets. These test sets are extracted from a video repository that contains a wide range of broadcast content including documentaries, sports, news, movies, etc. Overall, the experimental results show the adequacy of color-histogram-based signatures for video fingerprinting applications and indicate which type of signature should be preferred in the presence of certain distortions.

Quality Estimation of Video Transmitted overan Additive WGN Channel based on Digital Watermarking and Wavelet Transform

This paper presents an evaluation for a wavelet-based digital watermarking technique used in estimating the quality of video sequences transmitted over Additive White Gaussian Noise (AWGN) channel in terms of a classical objective metric, such as Peak Signal-to-Noise Ratio (PSNR) without the need of the original video. In this method, a watermark is embedded into the Discrete Wavelet Transform (DWT) domain of the original video frames using a quantization method. The degradation of the extracted watermark can be used to estimate the video quality in terms of PSNR with good accuracy. We calculated PSNR for video frames contaminated with AWGN and compared the values with those estimated using the Watermarking-DWT based approach. It is found that the calculated and estimated quality measures of the video frames are highly correlated, suggesting that this method can provide a good quality measure for video frames transmitted over AWGN channel without the need of the original video.

Enhanced Frame-based Video Coding to Support Content-based Functionalities

This paper presents the enhanced frame-based video coding scheme. The input source video to the enhanced frame-based video encoder consists of a rectangular-size video and shapes of arbitrarily-shaped objects on video frames. The rectangular frame texture is encoded by the conventional frame-based coding technique and the video object-s shape is encoded using the contour-based vertex coding. It is possible to achieve several useful content-based functionalities by utilizing the shape information in the bitstream at the cost of a very small overhead to the bitrate.

Face Reconstruction and Camera Pose Using Multi-dimensional Descent

This paper aims to propose a novel, robust, and simple method for obtaining a human 3D face model and camera pose (position and orientation) from a video sequence. Given a video sequence of a face recorded from an off-the-shelf digital camera, feature points used to define facial parts are tracked using the Active- Appearance Model (AAM). Then, the face-s 3D structure and camera pose of each video frame can be simultaneously calculated from the obtained point correspondences. This proposed method is primarily based on the combined approaches of Gradient Descent and Powell-s Multidimensional Minimization. Using this proposed method, temporarily occluded point including the case of self-occlusion does not pose a problem. As long as the point correspondences displayed in the video sequence have enough parallax, these missing points can still be reconstructed.

Design of FIR Filter for Water Level Detection

This paper proposes a new design of spatial FIR filter to automatically detect water level from a video signal of various river surroundings. A new approach in this report applies "addition" of frames and a "horizontal" edge detector to distinguish water region and land region. Variance of each line of a filtered video frame is used as a feature value. The water level is recognized as a boundary line between the land region and the water region. Edge detection filter essentially demarcates between two distinctly different regions. However, the conventional filters are not automatically adaptive to detect water level in various lighting conditions of river scenery. An optimized filter is purposed so that the system becomes robust to changes of lighting condition. More reliability of the proposed system with the optimized filter is confirmed by accuracy of water level detection.

Fast Search for MPEG Video Clips Using Adjacent Pixel Intensity Difference Quantization Histogram Feature

In this paper, we propose a novel fast search algorithm for short MPEG video clips from video database. This algorithm is based on the adjacent pixel intensity difference quantization (APIDQ) algorithm, which had been reliably applied to human face recognition previously. An APIDQ histogram is utilized as the feature vector of the frame image. Instead of fully decompressed video frames, partially decoded data, namely DC images are utilized. Combined with active search [4], a temporal pruning algorithm, fast and robust video search can be realized. The proposed search algorithm has been evaluated by 6 hours of video to search for given 200 MPEG video clips which each length is 15 seconds. Experimental results show the proposed algorithm can detect the similar video clip in merely 80ms, and Equal Error Rate (ERR) of 3 % is achieved, which is more accurately and robust than conventional fast video search algorithm.

Motion Analysis for Duplicate Frame Removal in Wireless Capsule Endoscope Video

Wireless capsule Endoscopy (WCE) has rapidly shown its wide applications in medical domain last ten years thanks to its noninvasiveness for patients and support for thorough inspection through a patient-s entire digestive system including small intestine. However, one of the main barriers to efficient clinical inspection procedure is that it requires large amount of effort for clinicians to inspect huge data collected during the examination, i.e., over 55,000 frames in video. In this paper, we propose a method to compute meaningful motion changes of WCE by analyzing the obtained video frames based on regional optical flow estimations. The computed motion vectors are used to remove duplicate video frames caused by WCE-s imaging nature, such as repetitive forward-backward motions from peristaltic movements. The motion vectors are derived by calculating directional component vectors in four local regions. Our experiments are performed on small intestine area, which is of main interest to clinical experts when using WCEs, and our experimental results show significant frame reductions comparing with a simple frame-to-frame similarity-based image reduction method.

Implementing a Visual Servoing System for Robot Controlling

Nowadays, with the emerging of the new applications like robot control in image processing, artificial vision for visual servoing is a rapidly growing discipline and Human-machine interaction plays a significant role for controlling the robot. This paper presents a new algorithm based on spatio-temporal volumes for visual servoing aims to control robots. In this algorithm, after applying necessary pre-processing on video frames, a spatio-temporal volume is constructed for each gesture and feature vector is extracted. These volumes are then analyzed for matching in two consecutive stages. For hand gesture recognition and classification we tested different classifiers including k-Nearest neighbor, learning vector quantization and back propagation neural networks. We tested the proposed algorithm with the collected data set and results showed the correct gesture recognition rate of 99.58 percent. We also tested the algorithm with noisy images and algorithm showed the correct recognition rate of 97.92 percent in noisy images.

Motions of Multiple Objects Detection Based On Video Frames

This paper introduces an intelligent system, which can be applied in the monitoring of vehicle speed using a single camera. The ability of motion tracking is extremely useful in many automation problems and the solution to this problem will open up many future applications. One of the most common problems in our daily life is the speed detection of vehicles on a highway. In this paper, a novel technique is developed to track multiple moving objects with their speeds being estimated using a sequence of video frames. Field test has been conducted to capture real-life data and the processed results were presented. Multiple object problems and noisy in data are also considered. Implementing this system in real-time is straightforward. The proposal can accurately evaluate the position and the orientation of moving objects in real-time. The transformations and calibration between the 2D image and the actual road are also considered.

Realtime Lip Contour Tracking For Audio-Visual Speech Recognition Applications

Detection and tracking of the lip contour is an important issue in speechreading. While there are solutions for lip tracking once a good contour initialization in the first frame is available, the problem of finding such a good initialization is not yet solved automatically, but done manually. We have developed a new tracking solution for lip contour detection using only few landmarks (15 to 25) and applying the well known Active Shape Models (ASM). The proposed method is a new LMS-like adaptive scheme based on an Auto regressive (AR) model that has been fit on the landmark variations in successive video frames. Moreover, we propose an extra motion compensation model to address more general cases in lip tracking. Computer simulations demonstrate a fair match between the true and the estimated spatial pixels. Significant improvements related to the well known LMS approach has been obtained via a defined Frobenius norm index.

Inter-frame Collusion Attack in SS-N Video Watermarking System

Video watermarking is usually considered as watermarking of a set of still images. In frame-by-frame watermarking approach, each video frame is seen as a single watermarked image, so collusion attack is more critical in video watermarking. If the same or redundant watermark is used for embedding in every frame of video, the watermark can be estimated and then removed by watermark estimate remodolulation (WER) attack. Also if uncorrelated watermarks are used for every frame, these watermarks can be washed out with frame temporal filtering (FTF). Switching watermark system or so-called SS-N system has better performance against WER and FTF attacks. In this system, for each frame, the watermark is randomly picked up from a finite pool of watermark patterns. At first SS-N system will be surveyed and then a new collusion attack for SS-N system will be proposed using a new algorithm for separating video frame based on watermark pattern. So N sets will be built in which every set contains frames carrying the same watermark. After that, using WER attack in every set, N different watermark patterns will be estimated and removed later.

FPGA Implementation of a Vision-Based Blind Spot Warning System

Vision-based intelligent vehicle applications often require large amounts of memory to handle video streaming and image processing, which in turn increases complexity of hardware and software. This paper presents an FPGA implement of a vision-based blind spot warning system. Using video frames, the information of the blind spot area turns into one-dimensional information. Analysis of the estimated entropy of image allows the detection of an object in time. This idea has been implemented in the XtremeDSP video starter kit. The blind spot warning system uses only 13% of its logic resources and 95k bits block memory, and its frame rate is over 30 frames per sec (fps).