Abstract: Human motion recognition has been extensively increased in recent years due to its importance in a wide range of applications, such as human-computer interaction, intelligent surveillance, augmented reality, content-based video compression and retrieval, etc. However, it is still regarded as a challenging task especially in realistic scenarios. It can be seen as a general machine learning problem which requires an effective human motion representation and an efficient learning method. In this work, we introduce a descriptor based on Laban Movement Analysis technique, a formal and universal language for human movement, to capture both quantitative and qualitative aspects of movement. We use Discrete Hidden Markov Model (DHMM) for training and classification motions. We improve the classification algorithm by proposing two DHMMs for each motion class to process the motion sequence in two different directions, forward and backward. Such modification allows avoiding the misclassification that can happen when recognizing similar motions. Two experiments are conducted. In the first one, we evaluate our method on a public dataset, the Microsoft Research Cambridge-12 Kinect gesture data set (MSRC-12) which is a widely used dataset for evaluating action/gesture recognition methods. In the second experiment, we build a dataset composed of 10 gestures(Introduce yourself, waving, Dance, move, turn left, turn right, stop, sit down, increase velocity, decrease velocity) performed by 20 persons. The evaluation of the system includes testing the efficiency of our descriptor vector based on LMA with basic DHMM method and comparing the recognition results of the modified DHMM with the original one. Experiment results demonstrate that our method outperforms most of existing methods that used the MSRC-12 dataset, and a near perfect classification rate in our dataset.
Abstract: Nowadays, demand for using real-time video transmission capable devices is ever-increasing. So, high resolution videos have made efficient video compression techniques an essential component for capturing and transmitting video data. Motion estimation has a critical role in encoding raw video. Hence, various motion estimation methods are introduced to efficiently compress the video. Low bit‑depth representation based motion estimation methods facilitate computation of matching criteria and thus, provide small hardware footprint. In this paper, a hardware implementation of a two-bit transformation based low-complexity motion estimation method using local binary pattern approach is proposed. Image frames are represented in two-bit depth instead of full-depth by making use of the local binary pattern as a binarization approach and the binarization part of the hardware architecture is explained in detail. Experimental results demonstrate the difference between the proposed hardware architecture and the architectures of well-known low-complexity motion estimation methods in terms of important aspects such as resource utilization, energy and power consumption.
Abstract: This paper proposes and implements an core transform architecture, which is one of the major processes in HEVC video compression standard. The proposed core transform architecture is implemented with only adders and shifters instead of area-consuming multipliers. Shifters in the proposed core transform architecture are implemented in wires and multiplexers, which significantly reduces chip area. Also, it can process from 4×4 to 16×16 blocks with common hardware by reusing processing elements. Designed core transform architecture in 0.13um technology can process a 16×16 block with 2-D transform in 130 cycles, and its gate count is 101,015 gates.
Abstract: Efficient storage, transmission and use of video information are key requirements in many multimedia applications currently being addressed by MPEG-4. To fulfill these requirements, a new approach for representing video information which relies on an object-based representation, has been adopted. Therefore, objectbased watermarking schemes are needed for copyright protection. This paper proposes a novel blind object watermarking scheme for images and video using the in place lifting shape adaptive-discrete wavelet transform (SA-DWT). In order to make the watermark robust and transparent, the watermark is embedded in the average of wavelet blocks using the visual model based on the human visual system. Wavelet coefficients n least significant bits (LSBs) are adjusted in concert with the average. Simulation results shows that the proposed watermarking scheme is perceptually invisible and robust against many attacks such as lossy image/video compression (e.g. JPEG, JPEG2000 and MPEG-4), scaling, adding noise, filtering, etc.
Abstract: Full search block matching algorithm is widely used for hardware implementation of motion estimators in video compression algorithms. In this paper we are proposing a new architecture, which consists of a 2D parallel processing unit and a 1D unit both working in parallel. The proposed architecture reduces both data access power and computational power which are the main causes of power consumption in integer motion estimation. It also completes the operations with nearly the same number of clock cycles as compared to a 2D systolic array architecture. In this work sum of absolute difference (SAD)-the most repeated operation in block matching, is calculated in two steps. The first step is to calculate the SAD for alternate rows by a 2D parallel unit. If the SAD calculated by the parallel unit is less than the stored minimum SAD, the SAD of the remaining rows is calculated by the 1D unit. Early termination, which stops avoidable computations has been achieved with the help of alternate rows method proposed in this paper and by finding a low initial SAD value based on motion vector prediction. Data reuse has been applied to the reference blocks in the same search area which significantly reduced the memory access.
Abstract: This paper provides a flexible way of controlling
Variable-Bit-Rate (VBR) of compressed digital video, applicable to
the new H264 video compression standard. The entire video
sequence is assessed in advance and the quantisation level is then set
such that bit rate (and thus the frame rate) remains within
predetermined limits compatible with the bandwidth of the
transmission system and the capabilities of the remote end, while at
the same time providing constant quality similar to VBR encoding.
A process for avoiding buffer starvation by selectively eliminating
frames from the encoded output at times when the frame rate is slow
(large number of bits per frame) will be also described. Finally, the
problem of buffer overflow will be solved by selectively eliminating
frames from the received input to the decoder. The decoder detects
the omission of the frames and resynchronizes the transmission by
monitoring time stamps and repeating frames if necessary.
Abstract: In this paper, a new approach for quality assessment
tasks in lossy compressed digital video is proposed. The research
activity is based on the visual fixation data recorded by an eye
tracker. The method involved both a new paradigm for subjective
quality evaluation and the subsequent statistical analysis to match
subjective scores provided by the observer to the data obtained from
the eye tracker experiments. The study brings improvements to the
state of the art, as it solves some problems highlighted in literature.
The experiments prove that data obtained from an eye tracker can be
used to classify videos according to the level of impairment due to
compression. The paper presents the methodology, the experimental
results and their interpretation. Conclusions suggest that the eye
tracker can be useful in quality assessment, if data are collected and
analyzed in a proper way.
Abstract: A prototype for audio and video capture and compression in real time on a Linux platform has been developed. It is able to visualize both the captured and the compressed video at the same time, as well as the captured and compressed audio with the goal of comparing their quality. As it is based on free code, the final goal is to run it in an embedded system running Linux. Therefore, we would implement a node to capture and compress such multimedia information. Thus, it would be possible to consider the project within a larger one aimed at live broadcast of audio and video using a streaming server which would communicate with our node. Then, we would have a very powerful and flexible system with several practical applications.
Abstract: Motion estimation is the most computationally
intensive part in video processing. Many fast motion estimation
algorithms have been proposed to decrease the computational
complexity by reducing the number of candidate motion vectors.
However, these studies are for fast search algorithms themselves while
almost image and video compressions are operated with software
based. Therefore, the timing constraints for running these motion
estimation algorithms not only challenge for the video codec but also
overwhelm for some of processors. In this paper, the performance of
motion estimation is enhanced by using Intel's Streaming SIMD
Extension 2 (SSE2) technology with Intel Pentium 4 processor.