Abstract: Delivering streaming video over wireless is an
important component of many interactive multimedia applications
running on personal wireless handset devices. Such personal devices
have to be inexpensive, compact, and lightweight. But wireless
channels have a high channel bit error rate and limited bandwidth.
Delay variation of packets due to network congestion and the high bit
error rate greatly degrades the quality of video at the handheld
device. Therefore, mobile access to multimedia contents requires
video transcoding functionality at the edge of the mobile network for
interworking with heterogeneous networks and services. Therefore,
to guarantee quality of service (QoS) delivered to the mobile user, a
robust and efficient transcoding scheme should be deployed in
mobile multimedia transporting network. Hence, this paper
examines the challenges and limitations that the video transcoding
schemes in mobile multimedia transporting network face. Then
handheld resources, network conditions and content based mobile
and wireless video transcoding is proposed to provide high QoS
applications. Exceptional performance is demonstrated in the
experiment results. These experiments were designed to verify and
prove the robustness of the proposed approach. Extensive
experiments have been conducted, and the results of various video
clips with different bit rate and frame rate have been provided.
Abstract: This paper discusses the cued speech recognition
methods in videoconference. Cued speech is a specific gesture
language that is used for communication between deaf people. We
define the criteria for sentence intelligibility according to answers of
testing subjects (deaf people). In our tests we use 30 sample videos
coded by H.264 codec with various bit-rates and various speed of
cued speech. Additionally, we define the criteria for consonant sign
recognizability in single-handed finger alphabet (dactyl) analogically
to acoustics. We use another 12 sample videos coded by H.264 codec
with various bit-rates in four different video formats. To interpret the
results we apply the standard scale for subjective video quality
evaluation and the percentual evaluation of intelligibility as in
acoustics. From the results we construct the minimum coded bit-rate
recommendations for every spatial resolution.
Abstract: As it is known, buoyancy and drag forces rule bubble's rise velocity in a liquid column. These forces are strongly dependent on fluid properties, gravity as well as equivalent's diameter. This study reports a set of bubble rising velocity experiments in a liquid column using water or glycerol. Several records of terminal velocity were obtained. The results show that bubble's rise terminal velocity is strongly dependent on dynamic viscosity effect. The data set allowed to have some terminal velocities data interval of 8.0 ? 32.9 cm/s with Reynolds number interval 1.3 -7490. The bubble's movement was recorded with a video camera. The main goal is to present an original set data and results that will be discussed based on two-phase flow's theory. It will also discussed, the prediction of terminal velocity of a single bubble in liquid, as well as the range of its applicability. In conclusion, this study presents general expressions for the determination of the terminal velocity of isolated gas bubbles of a Reynolds number range, when the fluid proprieties are known.
Abstract: Arbitrarily shaped video objects are an important
concept in modern video coding methods. The techniques presently
used are not based on image elements but rather video objects having
an arbitrary shape. In this paper, spatial shape error concealment
techniques to be used for object-based image in error-prone
environments are proposed. We consider a geometric shape
representation consisting of the object boundary, which can be
extracted from the α-plane. Three different approaches are used to
replace a missing boundary segment: Bézier interpolation, Bézier
approximation and NURBS approximation. Experimental results on
object shape with different concealment difficulty demonstrate the
performance of the proposed methods. Comparisons with proposed
methods are also presented.
Abstract: Extracting in-play scenes in sport videos is essential for
quantitative analysis and effective video browsing of the sport
activities. Game analysis of badminton as of the other racket sports
requires detecting the start and end of each rally period in an
automated manner. This paper describes an automatic serve scene
detection method employing cubic higher-order local auto-correlation
(CHLAC) and multiple regression analysis (MRA). CHLAC can
extract features of postures and motions of multiple persons without
segmenting and tracking each person by virtue of shift-invariance and
additivity, and necessitate no prior knowledge. Then, the specific
scenes, such as serve, are detected by linear regression (MRA) from
the CHLAC features. To demonstrate the effectiveness of our method,
the experiment was conducted on video sequences of five badminton
matches captured by a single ceiling camera. The averaged precision
and recall rates for the serve scene detection were 95.1% and 96.3%,
respectively.
Abstract: The fine structure of supercavitation in the wake of a
symmetrical cylinder is studied with high-speed video cameras. The
flow is observed in a cavitation tunnel at the speed of 8m/sec when the
sidewall and the wake are partially filled with the massive cavitation
bubbles. The present experiment observed that a two-dimensional
ripple wave with a wave length of 0.3mm is propagated in a
downstream direction, and then abruptly increases to a thicker
three-dimensional layer. IR-photography recorded that the wakes
originated from the horseshoe vortexes alongside the cylinder. The
wake was developed to inside the dead water zone, which absorbed the
bubbly wake propelled from the separated vortices at the center of the
cylinder. A remote sensing classification technique (maximum most
likelihood) determined that the surface porosity was 0.2, and the mean
speed in the mixed wake was 7m/sec. To confirm the existence of
two-dimensional wave motions in the interface, the experiments were
conducted at a very low frequency, and showed similar gravity waves
in both the upper and lower interfaces.
Abstract: In this paper, we propose a fully-utilized, block-based 2D DWT (discrete wavelet transform) architecture, which consists of four 1D DWT filters with two-channel QMF lattice structure. The proposed architecture requires about 2MN-3N registers to save the intermediate results for higher level decomposition, where M and N stand for the filter length and the row width of the image respectively. Furthermore, the proposed 2D DWT processes in horizontal and vertical directions simultaneously without an idle period, so that it computes the DWT for an N×N image in a period of N2(1-2-2J)/3. Compared to the existing approaches, the proposed architecture shows 100% of hardware utilization and high throughput rates. To mitigate the long critical path delay due to the cascaded lattices, we can apply the pipeline technique with four stages, while retaining 100% of hardware utilization. The proposed architecture can be applied in real-time video signal processing.
Abstract: Human activity is a major concern in a wide variety of
applications, such as video surveillance, human computer interface
and face image database management. Detecting and recognizing
faces is a crucial step in these applications. Furthermore, major
advancements and initiatives in security applications in the past years
have propelled face recognition technology into the spotlight. The
performance of existing face recognition systems declines significantly
if the resolution of the face image falls below a certain level.
This is especially critical in surveillance imagery where often, due to
many reasons, only low-resolution video of faces is available. If these
low-resolution images are passed to a face recognition system, the
performance is usually unacceptable. Hence, resolution plays a key
role in face recognition systems. In this paper we introduce a new
low resolution face recognition system based on mixture of expert
neural networks. In order to produce the low resolution input images
we down-sampled the 48 × 48 ORL images to 12 × 12 ones using
the nearest neighbor interpolation method and after that applying
the bicubic interpolation method yields enhanced images which is
given to the Principal Component Analysis feature extractor system.
Comparison with some of the most related methods indicates that
the proposed novel model yields excellent recognition rate in low
resolution face recognition that is the recognition rate of 100% for
the training set and 96.5% for the test set.
Abstract: Multimedia information availability has increased
dramatically with the advent of video broadcasting on handheld
devices. But with this availability comes problems of maintaining the
security of information that is displayed in public. ISMA Encryption
and Authentication (ISMACryp) is one of the chosen technologies for
service protection in DVB-H (Digital Video Broadcasting-
Handheld), the TV system for portable handheld devices. The
ISMACryp is encoded with H.264/AVC (advanced video coding),
while leaving all structural data as it is. Two modes of ISMACryp are
available; the CTR mode (Counter type) and CBC mode (Cipher
Block Chaining) mode. Both modes of ISMACryp are based on 128-
bit AES algorithm. AES algorithms are more complex and require
larger time for execution which is not suitable for real time
application like live TV. The proposed system aims to gain a deep
understanding of video data security on multimedia technologies and
to provide security for real time video applications using selective
encryption for H.264/AVC. Five level of security proposed in this
paper based on the content of NAL unit in Baseline Constrain profile
of H.264/AVC. The selective encryption in different levels provides
encryption of intra-prediction mode, residue data, inter-prediction
mode or motion vectors only. Experimental results shown in this
paper described that fifth level which is ISMACryp provide higher
level of security with more encryption time and the one level provide
lower level of security by encrypting only motion vectors with lower
execution time without compromise on compression and quality of
visual content. This encryption scheme with compression process
with low cost, and keeps the file format unchanged with some direct
operations supported. Simulation was being carried out in Matlab.
Abstract: The modern telecommunication industry demands
higher capacity networks with high data rate. Orthogonal frequency
division multiplexing (OFDM) is a promising technique for high data
rate wireless communications at reasonable complexity in wireless
channels. OFDM has been adopted for many types of wireless
systems like wireless local area networks such as IEEE 802.11a, and
digital audio/video broadcasting (DAB/DVB). The proposed research
focuses on a concatenated coding scheme that improve the
performance of OFDM based wireless communications. It uses a
Redundant Residue Number System (RRNS) code as the outer code
and a convolutional code as the inner code. Here, a direct conversion
of analog signal to residue domain is done to reduce the conversion
complexity using sigma-delta based parallel analog-to-residue
converter. The bit error rate (BER) performances of the proposed
system under different channel conditions are investigated. These
include the effect of additive white Gaussian noise (AWGN),
multipath delay spread, peak power clipping and frame start
synchronization error. The simulation results show that the proposed
RRNS-Convolutional concatenated coding (RCCC) scheme provides
significant improvement in the system performance by exploiting the
inherent properties of RRNS.
Abstract: In this paper, we propose novel algorithmic models
based on information fusion and feature transformation in crossmodal
subspace for different types of residue features extracted from
several intra-frame and inter-frame pixel sub-blocks in video
sequences for detecting digital video tampering or forgery. An
evaluation of proposed residue features – the noise residue features
and the quantization features, their transformation in cross-modal
subspace, and their multimodal fusion, for emulated copy-move
tamper scenario shows a significant improvement in tamper detection
accuracy as compared to single mode features without transformation
in cross-modal subspace.
Abstract: Using vision based solution in intelligent vehicle application often needs large memory to handle video stream and image process which increase complexity of hardware and software. In this paper, we present a FPGA implement of a vision based lane departure warning system. By taking frame of videos, the line gradient of line is estimated and the lane marks are found. By analysis the position of lane mark, departure of vehicle will be detected in time. This idea has been implemented in Xilinx Spartan6 FPGA. The lane departure warning system used 39% logic resources and no memory of the device. The average availability is 92.5%. The frame rate is more than 30 frames per second (fps).
Abstract: In 3D-wavelet video coding framework temporal
filtering is done along the trajectory of motion using Motion
Compensated Temporal Filtering (MCTF). Hence computationally
efficient motion estimation technique is the need of MCTF. In this
paper a predictive technique is proposed in order to reduce the
computational complexity of the MCTF framework, by exploiting
the high correlation among the frames in a Group Of Picture (GOP).
The proposed technique applies coarse and fine searches of any fast
block based motion estimation, only to the first pair of frames in a
GOP. The generated motion vectors are supplied to the next
consecutive frames, even to subsequent temporal levels and only fine
search is carried out around those predicted motion vectors. Hence
coarse search is skipped for all the motion estimation in a GOP
except for the first pair of frames. The technique has been tested for
different fast block based motion estimation algorithms over different
standard test sequences using MC-EZBC, a state-of-the-art scalable
video coder. The simulation result reveals substantial reduction (i.e.
20.75% to 38.24%) in the number of search points during motion
estimation, without compromising the quality of the reconstructed
video compared to non-predictive techniques. Since the motion
vectors of all the pair of frames in a GOP except the first pair will
have value ±1 around the motion vectors of the previous pair of
frames, the number of bits required for motion vectors is also
reduced by 50%.
Abstract: The practical implementation of audio-video coupled speech recognition systems is mainly limited by the hardware complexity to integrate two radically different information capturing devices with good temporal synchronisation. In this paper, we propose a solution based on a smart CMOS image sensor in order to simplify the hardware integration difficulties. By using on-chip image processing, this smart sensor can calculate in real time the X/Y projections of the captured image. This on-chip projection reduces considerably the volume of the output data. This data-volume reduction permits a transmission of the condensed visual information via the same audio channel by using a stereophonic input available on most of the standard computation devices such as PC, PDA and mobile phones. A prototype called VMIKE (Visio-Microphone) has been designed and realised by using standard 0.35um CMOS technology. A preliminary experiment gives encouraged results. Its efficiency will be further investigated in a large variety of applications such as biometrics, speech recognition in noisy environments, and vocal control for military or disabled persons, etc.
Abstract: Wireless capsule Endoscopy (WCE) has rapidly
shown its wide applications in medical domain last ten years
thanks to its noninvasiveness for patients and support for thorough
inspection through a patient-s entire digestive system including
small intestine. However, one of the main barriers to efficient
clinical inspection procedure is that it requires large amount of
effort for clinicians to inspect huge data collected during the
examination, i.e., over 55,000 frames in video. In this paper, we
propose a method to compute meaningful motion changes of
WCE by analyzing the obtained video frames based on regional
optical flow estimations. The computed motion vectors are used to
remove duplicate video frames caused by WCE-s imaging nature,
such as repetitive forward-backward motions from peristaltic
movements. The motion vectors are derived by calculating
directional component vectors in four local regions. Our
experiments are performed on small intestine area, which is of
main interest to clinical experts when using WCEs, and our
experimental results show significant frame reductions comparing
with a simple frame-to-frame similarity-based image reduction
method.
Abstract: Organization of video databases is becoming difficult
task as the amount of video content increases. Video classification
based on the content of videos can significantly increase the speed of
tasks such as browsing and searching for a particular video in a
database. In this paper, a content-based videos classification system
for the classes indoor and outdoor is presented. The system is
intended to be used on a mobile platform with modest resources. The
algorithm makes use of the temporal redundancy in videos, which
allows using an uncomplicated classification model while still
achieving reasonable accuracy. The training and evaluation was done
on a video database of 443 videos downloaded from a video sharing
service. A total accuracy of 87.36% was achieved.
Abstract: The mixture formation prior to the ignition process
plays as a key element in the diesel combustion. Parametric studies of
mixture formation and ignition process in various injection parameter
has received considerable attention in potential for reducing
emissions. Purpose of this study is to clarify the effects of injection
pressure on mixture formation and ignition especially during ignition
delay period, which have to be significantly influences throughout the
combustion process and exhaust emissions. This study investigated
the effects of injection pressure on diesel combustion fundamentally
using rapid compression machine. The detail behavior of mixture
formation during ignition delay period was investigated using the
schlieren photography system with a high speed camera. This method
can capture spray evaporation, spray interference, mixture formation
and flame development clearly with real images. Ignition process and
flame development were investigated by direct photography method
using a light sensitive high-speed color digital video camera. The
injection pressure and air motion are important variable that strongly
affect to the fuel evaporation, endothermic and prolysis process
during ignition delay. An increased injection pressure makes spray tip
penetration longer and promotes a greater amount of fuel-air mixing
occurs during ignition delay. A greater quantity of fuel prepared
during ignition delay period thus predominantly promotes more rapid
heat release.
Abstract: With the rapid popularization of internet services, it is apparent that the next generation terrestrial communication systems must be capable of supporting various applications like voice, video, and data. This paper presents the performance evaluation of turbo- coded mobile terrestrial communication systems, which are capable of providing high quality services for delay sensitive (voice or video) and delay tolerant (text transmission) multimedia applications in urban and suburban areas. Different types of multimedia information require different service qualities, which are generally expressed in terms of a maximum acceptable bit-error-rate (BER) and maximum tolerable latency. The breakthrough discovery of turbo codes allows us to significantly reduce the probability of bit errors with feasible latency. In a turbo-coded system, a trade-off between latency and BER results from the choice of convolutional component codes, interleaver type and size, decoding algorithm, and the number of decoding iterations. This trade-off can be exploited for multimedia applications by using optimal and suboptimal performance parameter amalgamations to achieve different service qualities. The results are therefore proposing an adaptive framework for turbo-coded wireless multimedia communications which incorporate a set of performance parameters that achieve an appropriate set of service qualities, depending on the application's requirements.
Abstract: Corner detection and optical flow are common techniques for feature-based video stabilization. However, these algorithms are computationally expensive and should be performed at a reasonable rate. This paper presents an algorithm for discarding irrelevant feature points and maintaining them for future use so as to improve the computational cost. The algorithm starts by initializing a maintained set. The feature points in the maintained set are examined against its accuracy for modeling. Corner detection is required only when the feature points are insufficiently accurate for future modeling. Then, optical flows are computed from the maintained feature points toward the consecutive frame. After that, a motion model is estimated based on the simplified affine motion model and least square method, with outliers belonging to moving objects presented. Studentized residuals are used to eliminate such outliers. The model estimation and elimination processes repeat until no more outliers are identified. Finally, the entire algorithm repeats along the video sequence with the points remaining from the previous iteration used as the maintained set. As a practical application, an efficient video stabilization can be achieved by exploiting the computed motion models. Our study shows that the number of times corner detection needs to perform is greatly reduced, thus significantly improving the computational cost. Moreover, optical flow vectors are computed for only the maintained feature points, not for outliers, thus also reducing the computational cost. In addition, the feature points after reduction can sufficiently be used for background objects tracking as demonstrated in the simple video stabilizer based on our proposed algorithm.
Abstract: In this paper, we present an approach for soccer video
edition using a multimodal annotation. We propose to associate with
each video sequence of a soccer match a textual document to be used
for further exploitation like search, browsing and abstract edition.
The textual document contains video meta data, match meta data, and
match data. This document, generated automatically while the video
is analyzed, segmented and classified, can be enriched semi
automatically according to the user type and/or a specialized
recommendation system.