Abstract: Recognizing human action from videos is an active
field of research in computer vision and pattern recognition. Human
activity recognition has many potential applications such as video
surveillance, human machine interaction, sport videos retrieval and
robot navigation. Actually, local descriptors and bag of visuals words
models achieve state-of-the-art performance for human action
recognition. The main challenge in features description is how to
represent efficiently the local motion information. Most of the
previous works focus on the extension of 2D local descriptors on 3D
ones to describe local information around every interest point. In this
paper, we propose a new spatio-temporal descriptor based on a spacetime
description of moving points. Our description is focused on an
Accordion representation of video which is well-suited to recognize
human action from 2D local descriptors without the need to 3D
extensions. We use the bag of words approach to represent videos.
We quantify 2D local descriptor describing both temporal and spatial
features with a good compromise between computational complexity
and action recognition rates. We have reached impressive results on
publicly available action data set
Abstract: In this paper we propose a method for recognition of
adult video based on support vector machine (SVM). Different kernel
features are proposed to classify adult videos. SVM has an advantage
that it is insensitive to the relative number of training example in
positive (adult video) and negative (non adult video) classes. This
advantage is illustrated by comparing performance between different
SVM kernels for the identification of adult video.
Abstract: Monitored 3-Dimensional (3D) video experience can be utilized as “feedback information” to fine tune the service parameters for providing a better service to the demanding 3D service customers. The 3D video experience which includes both video quality and depth perception is influenced by several contextual and content related factors (e.g., ambient illumination condition, content characteristics, etc) due to the complex nature of the 3D video. Therefore, effective factors on this experience should be utilized while assessing it. In this paper, structural information of the depth map sequences of the 3D video is considered as content related factor effective on the depth perception assessment. Cartoon-like filter is utilized to abstract the significant depth levels in the depth map sequences to determine the structural information. Moreover, subjective experiments are conducted using 3D videos associated with cartoon-like depth map sequences to investigate the effectiveness of ambient illumination condition, which is a contextual factor, on depth perception. Using the knowledge gained through this study, 3D video experience metrics can be developed to deliver better service to the 3D video service users.
Abstract: In this paper we propose a method which improves the efficiency of video coding. Our method combines an adaptive GOP (group of pictures) structure and the shot cut detection. We have analyzed different approaches for shot cut detection with aim to choose the most appropriate one. The next step is to situate N frames to the positions of detected cuts during the process of video encoding. Finally the efficiency of the proposed method is confirmed by simulations and the obtained results are compared with fixed GOP structures of sizes 4, 8, 12, 16, 32, 64, 128 and GOP structure with length of entire video. Proposed method achieved the gain in bit rate from 0.37% to 50.59%, while providing PSNR (Peak Signal-to-Noise Ratio) gain from 1.33% to 0.26% in comparison to simulated fixed GOP structures.
Abstract: Encrypted messages sending frequently draws the attention
of third parties, perhaps causing attempts to break and
reveal the original messages. Steganography is introduced to hide
the existence of the communication by concealing a secret message
in an appropriate carrier like text, image, audio or video. Quantum
steganography where the sender (Alice) embeds her steganographic
information into the cover and sends it to the receiver (Bob) over a
communication channel. Alice and Bob share an algorithm and hide
quantum information in the cover. An eavesdropper (Eve) without
access to the algorithm can-t find out the existence of the quantum
message. In this paper, a text quantum steganography technique based
on the use of indefinite articles (a) or (an) in conjunction with the nonspecific
or non-particular nouns in English language and quantum
gate truth table have been proposed. The authors also introduced a
new code representation technique (SSCE - Secret Steganography
Code for Embedding) at both ends in order to achieve high level of
security. Before the embedding operation each character of the secret
message has been converted to SSCE Value and then embeds to cover
text. Finally stego text is formed and transmits to the receiver side.
At the receiver side different reverse operation has been carried out
to get back the original information.
Abstract: In this paper we present the algorithm which allows
us to have an object tracking close to real time in Full HD videos.
The frame rate (FR) of a video stream is considered to be between
5 and 30 frames per second. The real time track building will be
achieved if the algorithm can follow 5 or more frames per second. The
principle idea is to use fast algorithms when doing preprocessing to
obtain the key points and track them after. The procedure of matching
points during assignment is hardly dependent on the number of points.
Because of this we have to limit pointed number of points using the
most informative of them.
Abstract: This paper reports the feasibility of the ARMA model
to describe a bursty video source transmitting over a AAL5 ATM link
(VBR traffic). The traffic represents the activity of the action movie
"Lethal Weapon 3" transmitted over the ATM network using the Fore
System AVA-200 ATM video codec with a peak rate of 100 Mbps
and a frame rate of 25. The model parameters were estimated for a
single video source and independently multiplexed video sources. It
was found that the model ARMA (2, 4) is well-suited for the real data
in terms of average rate traffic profile, probability density function,
autocorrelation function, burstiness measure, and the pole-zero
distribution of the filter model.
Abstract: The proposed Multimedia Pronunciation Learning
Management System (MPLMS) in this study is a technology with
profound potential for inducing improvement in pronunciation
learning. The MPLMS optimizes the digitised phonetic symbols with
the integration of text, sound and mouth movement video. The
components are designed and developed in an online management
system which turns the web to a dynamic user-centric collection of
consistent and timely information for quality sustainable learning.
The aim of this study is to design and develop the MPLMS which
serves as an innovative tool to improve English pronunciation. This
paper discusses the iterative methodology and the three-phase Alessi
and Trollip model in the development of MPLMS. To align with the
flexibility of the development of educational software, the iterative
approach comprises plan, design, develop, evaluate and implement is
followed. To ensure the instructional appropriateness of MPLMS, the
instructional system design (ISD) model of Alessi and Trollip serves
as a platform to guide the important instructional factors and process.
It is expected that the results of future empirical research will support
the efficacy of MPLMS and its place as the premier pronunciation
learning system.
Abstract: A number of automated shot-change detection
methods for indexing a video sequence to facilitate browsing and
retrieval have been proposed in recent years. This paper emphasizes
on the simulation of video shot boundary detection using one of the
methods of the color histogram wherein scaling of the histogram
metrics is an added feature. The difference between the histograms of
two consecutive frames is evaluated resulting in the metrics. Further
scaling of the metrics is performed to avoid ambiguity and to enable
the choice of apt threshold for any type of videos which involves
minor error due to flashlight, camera motion, etc. Two sample videos
are used here with resolution of 352 X 240 pixels using color
histogram approach in the uncompressed media. An attempt is made
for the retrieval of color video. The simulation is performed for the
abrupt change in video which yields 90% recall and precision value.
Abstract: Delivering streaming video over wireless is an
important component of many interactive multimedia applications
running on personal wireless handset devices. Such personal devices
have to be inexpensive, compact, and lightweight. But wireless
channels have a high channel bit error rate and limited bandwidth.
Delay variation of packets due to network congestion and the high bit
error rate greatly degrades the quality of video at the handheld
device. Therefore, mobile access to multimedia contents requires
video transcoding functionality at the edge of the mobile network for
interworking with heterogeneous networks and services. Therefore,
to guarantee quality of service (QoS) delivered to the mobile user, a
robust and efficient transcoding scheme should be deployed in
mobile multimedia transporting network. Hence, this paper
examines the challenges and limitations that the video transcoding
schemes in mobile multimedia transporting network face. Then
handheld resources, network conditions and content based mobile
and wireless video transcoding is proposed to provide high QoS
applications. Exceptional performance is demonstrated in the
experiment results. These experiments were designed to verify and
prove the robustness of the proposed approach. Extensive
experiments have been conducted, and the results of various video
clips with different bit rate and frame rate have been provided.
Abstract: Wireless capsule Endoscopy (WCE) has rapidly
shown its wide applications in medical domain last ten years
thanks to its noninvasiveness for patients and support for thorough
inspection through a patient-s entire digestive system including
small intestine. However, one of the main barriers to efficient
clinical inspection procedure is that it requires large amount of
effort for clinicians to inspect huge data collected during the
examination, i.e., over 55,000 frames in video. In this paper, we
propose a method to compute meaningful motion changes of
WCE by analyzing the obtained video frames based on regional
optical flow estimations. The computed motion vectors are used to
remove duplicate video frames caused by WCE-s imaging nature,
such as repetitive forward-backward motions from peristaltic
movements. The motion vectors are derived by calculating
directional component vectors in four local regions. Our
experiments are performed on small intestine area, which is of
main interest to clinical experts when using WCEs, and our
experimental results show significant frame reductions comparing
with a simple frame-to-frame similarity-based image reduction
method.
Abstract: Histogram equalization is often used in image enhancement, but it can be also used in auto exposure. However, conventional histogram equalization does not work well when many pixels are concentrated in a narrow luminance range.This paper proposes an auto exposure method based on 2-way histogram equalization. Two cumulative distribution functions are used, where one is from dark to bright and the other is from bright to dark. In this paper, the proposed auto exposure method is also designed and implemented for image signal processors with full-HD images.
Abstract: Real-time object tracking is a problem which involves extraction of critical information from complex and uncertain imagedata. In this paper, we present a comprehensive methodology to design an artificial neural network (ANN) for a real-time object tracking application. The object, which is tracked for the purpose of demonstration, is a specific airplane. However, the proposed ANN can be trained to track any other object of interest. The ANN has been simulated and tested on the training and testing datasets, as well as on a real-time streaming video. The tracking error is analyzed with post-regression analysis tool, which finds the correlation among the calculated coordinates and the correct coordinates of the object in the image. The encouraging results from the computer simulation and analysis show that the proposed ANN architecture is a good candidate solution to a real-time object tracking problem.
Abstract: Mobile learning (m-learning) is a new method in teaching and learning process which combines technology of mobile device with learning materials. It can enhance student's engagement in learning activities and facilitate them to access the learning materials at anytime and anywhere. In Kolej Poly-Tech Mara (KPTM), this method is seen as an important effort in teaching practice and to improve student learning performance. The aim of this paper is to discuss the development of m-learning application called Mobile EEF Learning System (MEEFLS) to be implemented for Electric and Electronic Fundamentals course using Flash, XML (Extensible Markup Language) and J2ME (Java 2 micro edition). System Development Life Cycle (SDLC) was used as an application development approach. It has three modules in this application such as notes or course material, exercises and video. MEELFS development is seen as a tool or a pilot test for m-learning in KPTM.
Abstract: Ringing effect is one of the most annoying visual
artifacts in digital video. It is a significant factor of subjective quality
deterioration. However, there is a widely-accepted misunderstanding
of its cause. In this paper, we propose a reasonable interpretation of the
cause of ringing effect. Based on the interpretation, we suggest further
two methods to reduce ringing effect in DCT-based video coding. The
methods adaptively adjust quantizers according to video features. Our
experiments proved that the methods could efficiently improve
subjective quality with acceptable additional computing costs.
Abstract: In today-s competitive environment, the security concerns have grown tremendously. In the modern world, possession is known to be 9/10-ths of the law. Hence, it is imperative for one to be able to safeguard one-s property from worldly harms such as thefts, destruction of property, people with malicious intent etc. Due to the advent of technology in the modern world, the methodologies used by thieves and robbers for stealing have been improving exponentially. Therefore, it is necessary for the surveillance techniques to also improve with the changing world. With the improvement in mass media and various forms of communication, it is now possible to monitor and control the environment to the advantage of the owners of the property. The latest technologies used in the fight against thefts and destruction are the video surveillance and monitoring. By using the technologies, it is possible to monitor and capture every inch and second of the area in interest. However, so far the technologies used are passive in nature, i.e., the monitoring systems only help in detecting the crime but do not actively participate in stopping or curbing the crime while it takes place. Therefore, we have developed a methodology to detect the motion in a video stream environment and this is an idea to ensure that the monitoring systems not only actively participate in stopping the crime, but do so while the crime is taking place. Hence, a system is used to detect any motion in a live streaming video and once motion has been detected in the live stream, the software will activate a warning system and capture the live streaming video.
Abstract: A prototype for audio and video capture and compression in real time on a Linux platform has been developed. It is able to visualize both the captured and the compressed video at the same time, as well as the captured and compressed audio with the goal of comparing their quality. As it is based on free code, the final goal is to run it in an embedded system running Linux. Therefore, we would implement a node to capture and compress such multimedia information. Thus, it would be possible to consider the project within a larger one aimed at live broadcast of audio and video using a streaming server which would communicate with our node. Then, we would have a very powerful and flexible system with several practical applications.
Abstract: More and more home videos are being generated with the ever growing popularity of digital cameras and camcorders. For many home videos, a photo rendering, whether capturing a moment or a scene within the video, provides a complementary representation to the video. In this paper, a video motion mining framework for creative rendering is presented. The user-s capture intent is derived by analyzing video motions, and respective metadata is generated for each capture type. The metadata can be used in a number of applications, such as creating video thumbnail, generating panorama posters, and producing slideshows of video.