Abstract: In this work, we present for the first time in our perception an efficient digital watermarking scheme for mpeg audio layer 3 files that operates directly in the compressed data domain, while manipulating the time and subband/channel domain. In addition, it does not need the original signal to detect the watermark. Our scheme was implemented taking special care for the efficient usage of the two limited resources of computer systems: time and space. It offers to the industrial user the capability of watermark embedding and detection in time immediately comparable to the real music time of the original audio file that depends on the mpeg compression, while the end user/audience does not face any artifacts or delays hearing the watermarked audio file. Furthermore, it overcomes the disadvantage of algorithms operating in the PCMData domain to be vulnerable to compression/recompression attacks, as it places the watermark in the scale factors domain and not in the digitized sound audio data. The strength of our scheme, that allows it to be used with success in both authentication and copyright protection, relies on the fact that it gives to the users the enhanced capability their ownership of the audio file not to be accomplished simply by detecting the bit pattern that comprises the watermark itself, but by showing that the legal owner knows a hard to compute property of the watermark.
Abstract: In this work we develop an object extraction method
and propose efficient algorithms for object motion characterization.
The set of proposed tools serves as a basis for development of objectbased
functionalities for manipulation of video content. The
estimators by different algorithms are compared in terms of quality
and performance and tested on real video sequences. The proposed
method will be useful for the latest standards of encoding and
description of multimedia content – MPEG4 and MPEG7.
Abstract: A talking head system (THS) is presented to animate
the face of a speaking 3D avatar in such a way that it realistically
pronounces the given Korean text. The proposed system consists of
SAPI compliant text-to-speech (TTS) engine and MPEG-4 compliant
face animation generator. The input to the THS is a unicode text that is
to be spoken with synchronized lip shape. The TTS engine generates a
phoneme sequence with their duration and audio data. The TTS
applies the coarticulation rules to the phoneme sequence and sends a
mouth animation sequence to the face modeler. The proposed THS can
make more natural lip sync and facial expression by using the face
animation generator than those using the conventional visemes only.
The experimental results show that our system has great potential for
the implementation of talking head for Korean text.
Abstract: H.264/AVC offers a considerably higher improvement
in coding efficiency compared to other compression standards such
as MPEG-2, but computational complexity is increased significantly.
In this paper, we propose selective mode decision schemes for fast
intra prediction mode selection. The objective is to reduce the
computational complexity of the H.264/AVC encoder without
significant rate-distortion performance degradation. In our proposed
schemes, the intra prediction complexity is reduced by limiting the
luma and chroma prediction modes using the directional information
of the 16×16 prediction mode. Experimental results are presented to
show that the proposed schemes reduce the complexity by up to 78%
maintaining the similar PSNR quality with about 1.46% bit rate
increase in average.
Abstract: Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.
Abstract: Current image-based individual human recognition
methods, such as fingerprints, face, or iris biometric modalities
generally require a cooperative subject, views from certain aspects,
and physical contact or close proximity. These methods cannot
reliably recognize non-cooperating individuals at a distance in the
real world under changing environmental conditions. Gait, which
concerns recognizing individuals by the way they walk, is a relatively
new biometric without these disadvantages. The inherent gait
characteristic of an individual makes it irreplaceable and useful in
visual surveillance.
In this paper, an efficient gait recognition system for human
identification by extracting two features namely width vector of
the binary silhouette and the MPEG-7-based region-based shape
descriptors is proposed. In the proposed method, foreground objects
i.e., human and other moving objects are extracted by estimating
background information by a Gaussian Mixture Model (GMM) and
subsequently, median filtering operation is performed for removing
noises in the background subtracted image. A moving target classification
algorithm is used to separate human being (i.e., pedestrian)
from other foreground objects (viz., vehicles). Shape and boundary
information is used in the moving target classification algorithm.
Subsequently, width vector of the outer contour of binary silhouette
and the MPEG-7 Angular Radial Transform coefficients are taken as
the feature vector. Next, the Principal Component Analysis (PCA)
is applied to the selected feature vector to reduce its dimensionality.
These extracted feature vectors are used to train an Hidden Markov
Model (HMM) for identification of some individuals. The proposed
system is evaluated using some gait sequences and the experimental
results show the efficacy of the proposed algorithm.