Abstract: In this paper we present the design and the implementation of a target tracking system where the target is set to be a moving person in a video sequence. The system can be applied easily as a vision system for mobile robot. The system is composed of two major parts the first is the detection of the person in the video frame using the SVM learning machine based on the “HOG” descriptors. The second part is the tracking of a moving person it’s done by using a combination of the Kalman filter and a modified version of the Camshift tracking algorithm by adding the target motion feature to the color feature, the experimental results had shown that the new algorithm had overcame the traditional Camshift algorithm in robustness and in case of occlusion.
Abstract: The large pose discrepancy is one of the critical
challenges in face recognition during video surveillance. Due to
the entanglement of pose attributes with identity information, the
conventional approaches for pose-independent representation lack
in providing quality results in recognizing largely posed faces. In
this paper, we propose a practical approach to disentangle the pose
attribute from the identity information followed by synthesis of a face
using a classifier network in latent space. The proposed approach
employs a modified generative adversarial network framework
consisting of an encoder-decoder structure embedded with a classifier
in manifold space for carrying out factorization on the latent
encoding. It can be further generalized to other face and non-face
attributes for real-life video frames containing faces with significant
attribute variations. Experimental results and comparison with state
of the art in the field prove that the learned representation of the
proposed approach synthesizes more compelling perceptual images
through a combination of adversarial and classification losses.
Abstract: Live video streaming is one of the most widely used
service among end users, yet it is a big challenge for the network
operators in terms of quality. The only way to provide excellent
Quality of Experience (QoE) to the end users is continuous
monitoring of live video streaming. For this purpose, there are several
objective algorithms available that monitor the quality of the video in
a live stream. Subjective tests play a very important role in fine
tuning the results of objective algorithms. As human perception is
considered to be the most reliable source for assessing the quality of a
video stream subjective tests are conducted in order to develop more
reliable objective algorithms. Temporal impairments in a live video
stream can have a negative impact on the end users. In this paper we
have conducted subjective evaluation tests on a set of video
sequences containing temporal impairment known as frame freezing.
Frame Freezing is considered as a transmission error as well as a
hardware error which can result in loss of video frames on the
reception side of a transmission system. In our subjective tests, we
have performed tests on videos that contain a single freezing event
and also for videos that contain multiple freezing events. We have
recorded our subjective test results for all the videos in order to give a
comparison on the available No Reference (NR) objective
algorithms. Finally, we have shown the performance of no reference
algorithms used for objective evaluation of videos and suggested the
algorithm that works better. The outcome of this study shows the
importance of QoE and its effect on human perception. The results
for the subjective evaluation can serve the purpose for validating
objective algorithms.
Abstract: In this paper, we present a four-step ortho-rectification
procedure for real-time geo-referencing of video data from a low-cost
UAV equipped with a multi-sensor system. The basic procedures for
the real-time ortho-rectification are: (1) decompilation of the video
stream into individual frames; (2) establishing the interior camera
orientation parameters; (3) determining the relative orientation
parameters for each video frame with respect to each other; (4)
finding the absolute orientation parameters, using a self-calibration
bundle and adjustment with the aid of a mathematical model. Each
ortho-rectified video frame is then mosaicked together to produce a
mosaic image of the test area, which is then merged with a well
referenced existing digital map for the purpose of geo-referencing
and aerial surveillance. A test field located in Abuja, Nigeria was
used to evaluate our method. Video and telemetry data were collected
for about fifteen minutes, and they were processed using the four-step
ortho-rectification procedure. The results demonstrated that the
geometric measurement of the control field from ortho-images is
more accurate when compared with those from original perspective
images when used to pin point the exact location of targets on the
video imagery acquired by the UAV. The 2-D planimetric accuracy
when compared with the 6 control points measured by a GPS receiver
is between 3 to 5 metres.
Abstract: Key frame extraction methods select the most
representative frames of a video, which can be used in different areas
of video processing such as video retrieval, video summary, and video
indexing. In this paper we present a novel approach for extracting key
frames from video sequences. The frame is characterized uniquely by
his contours which are represented by the dominant blocks. These
dominant blocks are located on the contours and its near textures.
When the video frames have a noticeable changement, its dominant
blocks changed, then we can extracte a key frame. The dominant
blocks of every frame is computed, and then feature vectors are
extracted from the dominant blocks image of each frame and arranged
in a feature matrix. Singular Value Decomposition is used to calculate
sliding windows ranks of those matrices. Finally the computed ranks
are traced and then we are able to extract key frames of a video.
Experimental results show that the proposed approach is robust
against a large range of digital effects used during shot transition.
Abstract: A face recognition system is a computer application for automatically identifying or verifying a person from a digital image or a video frame. A lot of algorithms have been proposed for face recognition. Vector Quantization (VQ) based face recognition is a novel approach for face recognition. Here a new codebook generation for VQ based face recognition using Integrated Adaptive Fuzzy Clustering (IAFC) is proposed. IAFC is a fuzzy neural network which incorporates a fuzzy learning rule into a competitive neural network. The performance of proposed algorithm is demonstrated by using publicly available AT&T database, Yale database, Indian Face database and a small face database, DCSKU database created in our lab. In all the databases the proposed approach got a higher recognition rate than most of the existing methods. In terms of Equal Error Rate (ERR) also the proposed codebook is better than the existing methods.
Abstract: This paper proposes a video-based framework for face recognition to identify which faces appear in a video sequence. Our basic idea is like a tracking task - to track a selection of person candidates over time according to the observing visual features of face images in video frames. Hence, we employ the state-space model to formulate video-based face recognition by dividing this problem into two parts: the likelihood and the transition measures. The likelihood measure is to recognize whose face is currently being observed in video frames, for which two-dimensional linear discriminant analysis is employed. The transition measure estimates the probability of changing from an incorrect recognition at the previous stage to the correct person at the current stage. Moreover, extra nodes associated with head nodes are incorporated into our proposed state-space model. The experimental results are also provided to demonstrate the robustness and efficiency of our proposed approach.
Abstract: Now a days video data embedding approach is a very challenging and interesting task towards keeping real time video data secure. We can implement and use this technique with high-level applications. As the rate-distortion of any image is not confirmed, because the gain provided by accurate image frame segmentation are balanced by the inefficiency of coding objects of arbitrary shape, with a lot factors like losses that depend on both the coding scheme and the object structure. By using rate controller in association with the encoder one can dynamically adjust the target bitrate. This paper discusses about to keep secure videos by mixing signature data with negligible distortion in the original video, and to keep steganographic video as closely as possible to the quality of the original video. In this discussion we propose the method for embedding the signature data into separate video frames by the use of block Discrete Cosine Transform. These frames are then encoded by real time encoding H.264 scheme concepts. After processing, at receiver end recovery of original video and the signature data is proposed.
Abstract: The robustness of color-based signatures in the presence of a selection of representative distortions is investigated. Considered are five signatures that have been developed and evaluated within a new modular framework. Two signatures presented in this work are directly derived from histograms gathered from video frames. The other three signatures are based on temporal information by computing difference histograms between adjacent frames. In order to obtain objective and reproducible results, the evaluations are conducted based on several randomly assembled test sets. These test sets are extracted from a video repository that contains a wide range of broadcast content including documentaries, sports, news, movies, etc. Overall, the experimental results show the adequacy of color-histogram-based signatures for video fingerprinting applications and indicate which type of signature should be preferred in the presence of certain distortions.
Abstract: This paper presents an evaluation for a wavelet-based
digital watermarking technique used in estimating the quality of
video sequences transmitted over Additive White Gaussian Noise
(AWGN) channel in terms of a classical objective metric, such as
Peak Signal-to-Noise Ratio (PSNR) without the need of the original
video. In this method, a watermark is embedded into the Discrete
Wavelet Transform (DWT) domain of the original video frames
using a quantization method. The degradation of the extracted
watermark can be used to estimate the video quality in terms of
PSNR with good accuracy. We calculated PSNR for video frames
contaminated with AWGN and compared the values with those
estimated using the Watermarking-DWT based approach. It is found
that the calculated and estimated quality measures of the video
frames are highly correlated, suggesting that this method can provide
a good quality measure for video frames transmitted over AWGN
channel without the need of the original video.
Abstract: This paper presents the enhanced frame-based video coding scheme. The input source video to the enhanced frame-based video encoder consists of a rectangular-size video and shapes of arbitrarily-shaped objects on video frames. The rectangular frame texture is encoded by the conventional frame-based coding technique and the video object-s shape is encoded using the contour-based vertex coding. It is possible to achieve several useful content-based functionalities by utilizing the shape information in the bitstream at the cost of a very small overhead to the bitrate.
Abstract: This paper aims to propose a novel, robust, and simple method for obtaining a human 3D face model and camera pose (position and orientation) from a video sequence. Given a video sequence of a face recorded from an off-the-shelf digital camera, feature points used to define facial parts are tracked using the Active- Appearance Model (AAM). Then, the face-s 3D structure and camera pose of each video frame can be simultaneously calculated from the obtained point correspondences. This proposed method is primarily based on the combined approaches of Gradient Descent and Powell-s Multidimensional Minimization. Using this proposed method, temporarily occluded point including the case of self-occlusion does not pose a problem. As long as the point correspondences displayed in the video sequence have enough parallax, these missing points can still be reconstructed.
Abstract: This paper proposes a new design of spatial FIR
filter to automatically detect water level from a video signal of
various river surroundings. A new approach in this report applies
"addition" of frames and a "horizontal" edge detector to distinguish
water region and land region. Variance of each line of a filtered
video frame is used as a feature value. The water level is recognized
as a boundary line between the land region and the water region.
Edge detection filter essentially demarcates between two distinctly
different regions. However, the conventional filters are not
automatically adaptive to detect water level in various lighting
conditions of river scenery. An optimized filter is purposed so that
the system becomes robust to changes of lighting condition. More
reliability of the proposed system with the optimized filter is
confirmed by accuracy of water level detection.
Abstract: In this paper, we propose a novel fast search algorithm for short MPEG video clips from video database. This algorithm is based on the adjacent pixel intensity difference quantization (APIDQ) algorithm, which had been reliably applied to human face recognition previously. An APIDQ histogram is utilized as the feature vector of the frame image. Instead of fully decompressed video frames, partially decoded data, namely DC images are utilized. Combined with active search [4], a temporal pruning algorithm, fast and robust video search can be realized. The proposed search algorithm has been evaluated by 6 hours of video to search for given 200 MPEG video clips which each length is 15 seconds. Experimental results show the proposed algorithm can detect the similar video clip in merely 80ms, and Equal Error Rate (ERR) of 3 % is achieved, which is more accurately and robust than conventional fast video search algorithm.
Abstract: Wireless capsule Endoscopy (WCE) has rapidly
shown its wide applications in medical domain last ten years
thanks to its noninvasiveness for patients and support for thorough
inspection through a patient-s entire digestive system including
small intestine. However, one of the main barriers to efficient
clinical inspection procedure is that it requires large amount of
effort for clinicians to inspect huge data collected during the
examination, i.e., over 55,000 frames in video. In this paper, we
propose a method to compute meaningful motion changes of
WCE by analyzing the obtained video frames based on regional
optical flow estimations. The computed motion vectors are used to
remove duplicate video frames caused by WCE-s imaging nature,
such as repetitive forward-backward motions from peristaltic
movements. The motion vectors are derived by calculating
directional component vectors in four local regions. Our
experiments are performed on small intestine area, which is of
main interest to clinical experts when using WCEs, and our
experimental results show significant frame reductions comparing
with a simple frame-to-frame similarity-based image reduction
method.
Abstract: Nowadays, with the emerging of the new applications
like robot control in image processing, artificial vision for visual
servoing is a rapidly growing discipline and Human-machine
interaction plays a significant role for controlling the robot. This
paper presents a new algorithm based on spatio-temporal volumes for
visual servoing aims to control robots. In this algorithm, after
applying necessary pre-processing on video frames, a spatio-temporal
volume is constructed for each gesture and feature vector is extracted.
These volumes are then analyzed for matching in two consecutive
stages. For hand gesture recognition and classification we tested
different classifiers including k-Nearest neighbor, learning vector
quantization and back propagation neural networks. We tested the
proposed algorithm with the collected data set and results showed the
correct gesture recognition rate of 99.58 percent. We also tested the
algorithm with noisy images and algorithm showed the correct
recognition rate of 97.92 percent in noisy images.
Abstract: This paper introduces an intelligent system, which can be applied in the monitoring of vehicle speed using a single camera. The ability of motion tracking is extremely useful in many automation problems and the solution to this problem will open up many future applications. One of the most common problems in our daily life is the speed detection of vehicles on a highway. In this paper, a novel technique is developed to track multiple moving objects with their speeds being estimated using a sequence of video frames. Field test has been conducted to capture real-life data and the processed results were presented. Multiple object problems and noisy in data are also considered. Implementing this system in real-time is straightforward. The proposal can accurately evaluate the position and the orientation of moving objects in real-time. The transformations and calibration between the 2D image and the actual road are also considered.
Abstract: Detection and tracking of the lip contour is an important
issue in speechreading. While there are solutions for lip tracking
once a good contour initialization in the first frame is available,
the problem of finding such a good initialization is not yet solved
automatically, but done manually. We have developed a new tracking
solution for lip contour detection using only few landmarks (15
to 25) and applying the well known Active Shape Models (ASM).
The proposed method is a new LMS-like adaptive scheme based on
an Auto regressive (AR) model that has been fit on the landmark
variations in successive video frames. Moreover, we propose an extra
motion compensation model to address more general cases in lip
tracking. Computer simulations demonstrate a fair match between
the true and the estimated spatial pixels. Significant improvements
related to the well known LMS approach has been obtained via a
defined Frobenius norm index.
Abstract: Video watermarking is usually considered as watermarking of a set of still images. In frame-by-frame watermarking approach, each video frame is seen as a single watermarked image, so collusion attack is more critical in video watermarking. If the same or redundant watermark is used for embedding in every frame of video, the watermark can be estimated and then removed by watermark estimate remodolulation (WER) attack. Also if uncorrelated watermarks are used for every frame, these watermarks can be washed out with frame temporal filtering (FTF). Switching watermark system or so-called SS-N system has better performance against WER and FTF attacks. In this system, for each frame, the watermark is randomly picked up from a finite pool of watermark patterns. At first SS-N system will be surveyed and then a new collusion attack for SS-N system will be proposed using a new algorithm for separating video frame based on watermark pattern. So N sets will be built in which every set contains frames carrying the same watermark. After that, using WER attack in every set, N different watermark patterns will be estimated and removed later.
Abstract: Vision-based intelligent vehicle applications often require large amounts of memory to handle video streaming and image processing, which in turn increases complexity of hardware and software. This paper presents an FPGA implement of a vision-based blind spot warning system. Using video frames, the information of the blind spot area turns into one-dimensional information. Analysis of the estimated entropy of image allows the detection of an object in time. This idea has been implemented in the XtremeDSP video starter kit. The blind spot warning system uses only 13% of its logic resources and 95k bits block memory, and its frame rate is over 30 frames per sec (fps).