Abstract: Human pose estimation and tracking are to accurately identify and locate the positions of human joints in the video. It is a computer vision task which is of great significance for human motion recognition, behavior understanding and scene analysis. There has been remarkable progress on human pose estimation in recent years. However, more researches are needed for human pose tracking especially for online tracking. In this paper, a framework, called PoseSRPN, is proposed for online single-person pose estimation and tracking. We use Siamese network attaching a pose estimation branch to incorporate Single-person Pose Tracking (SPT) and Visual Object Tracking (VOT) into one framework. The pose estimation branch has a simple network structure that replaces the complex upsampling and convolution network structure with deconvolution. By augmenting the loss of fully convolutional Siamese network with the pose estimation task, pose estimation and tracking can be trained in one stage. Once trained, PoseSRPN only relies on a single bounding box initialization and producing human joints location. The experimental results show that while maintaining the good accuracy of pose estimation on COCO and PoseTrack datasets, the proposed method achieves a speed of 59 frame/s, which is superior to other pose tracking frameworks.
Abstract: As a rapid growth of digital videos and data
communications, video summarization that provides a shorter version
of the video for fast video browsing and retrieval is necessary.
Key frame extraction is one of the mechanisms to generate video
summary. In general, the extracted key frames should both represent
the entire video content and contain minimum redundancy. However,
most of the existing approaches heuristically select key frames; hence,
the selected key frames may not be the most different frames and/or
not cover the entire content of a video. In this paper, we propose
a method of video summarization which provides the reasonable
objective functions for selecting key frames. In particular, we apply
a statistical dependency measure called quadratic mutual informaion
as our objective functions for maximizing the coverage of the
entire video content as well as minimizing the redundancy among
selected key frames. The proposed key frame extraction algorithm
finds key frames as an optimization problem. Through experiments,
we demonstrate the success of the proposed video summarization
approach that produces video summary with better coverage of
the entire video content while less redundancy among key frames
comparing to the state-of-the-art approaches.
Abstract: Nowadays, demand for using real-time video transmission capable devices is ever-increasing. So, high resolution videos have made efficient video compression techniques an essential component for capturing and transmitting video data. Motion estimation has a critical role in encoding raw video. Hence, various motion estimation methods are introduced to efficiently compress the video. Low bit‑depth representation based motion estimation methods facilitate computation of matching criteria and thus, provide small hardware footprint. In this paper, a hardware implementation of a two-bit transformation based low-complexity motion estimation method using local binary pattern approach is proposed. Image frames are represented in two-bit depth instead of full-depth by making use of the local binary pattern as a binarization approach and the binarization part of the hardware architecture is explained in detail. Experimental results demonstrate the difference between the proposed hardware architecture and the architectures of well-known low-complexity motion estimation methods in terms of important aspects such as resource utilization, energy and power consumption.
Abstract: An increasing number of mobile devices with integrated
cameras has meant that most digital video comes from these devices.
These digital videos can be made anytime, anywhere and for different
purposes. They can also be shared on the Internet in a short period
of time and may sometimes contain recordings of illegal acts. The
need to reliably trace the origin becomes evident when these videos
are used for forensic purposes. This work proposes an algorithm
to identify the brand and model of mobile device which generated
the video. Its procedure is as follows: after obtaining the relevant
video information, a classification algorithm based on sensor noise
and Wavelet Transform performs the aforementioned identification
process. We also present experimental results that support the validity
of the techniques used and show promising results.
Abstract: Advertising is commonly used to foster sales and reputation of an institution. It is at first the growth of print advertising that has increased the population and number of periodicals of newspaper and its circulation. The rise of Internet and online media has somehow blurred the role of media and advertising though the intention is still to reach out to audience and to increase sales. The relationship between advertising and audience on a product purchase through persuasion has been developing from print media to online media. From the changing media environment and audience, it is the concern of this research to study the impact of online advertising to such a relationship cycle. The content of online advertisements is much of text, multimedia, photo, audio and video. The messages of such content format may indeed bring impacts to its audience and its credibility. This study is therefore reflecting the effectiveness of online advertisement and its influences on generation Y in their purchasing behavior. This study uses Media Dependency Theory to analyze the relationship between the impact of online advertisement and media usage pattern of generation Y. Hierarchy of Effectiveness Model is used as a marketing communication model to study the effectiveness of advertising and further to determine the impact of online advertisement on generation Y in their purchasing decision making. This research uses online survey to reach out the sample of generation Y. The results have shown that online advertisements do not affect much on purchase decision making even though generation Y relies much on the media content including online advertisement for its information and believing in its credibility. There are few other external factors that may interrupt the effectiveness of online advertising. The very obvious influence of purchasing behavior is actually derived from the peers.
Abstract: This paper presents a self-sustaining mobile system for
counting and classification of vehicles through processing video. It
proposes a counting and classification algorithm divided in four steps
that can be executed multiple times in parallel in a SBC (Single
Board Computer), like the Raspberry Pi 2, in such a way that it
can be implemented in real time. The first step of the proposed
algorithm limits the zone of the image that it will be processed.
The second step performs the detection of the mobile objects using
a BGS (Background Subtraction) algorithm based on the GMM
(Gaussian Mixture Model), as well as a shadow removal algorithm
using physical-based features, followed by morphological operations.
In the first step the vehicle detection will be performed by using
edge detection algorithms and the vehicle following through Kalman
filters. The last step of the proposed algorithm registers the vehicle
passing and performs their classification according to their areas.
An auto-sustainable system is proposed, powered by batteries and
photovoltaic solar panels, and the data transmission is done through
GPRS (General Packet Radio Service)eliminating the need of using
external cable, which will facilitate it deployment and translation to
any location where it could operate. The self-sustaining trailer will
allow the counting and classification of vehicles in specific zones
with difficult access.
Abstract: Gestures play a major role in comprehension and
memory recall due to the fact that aid the efficient channel of
the meaning and support listeners’ comprehension and memory. In
the present study, the assistance of two kinds of gestures (iconic
and beat gestures) is tested in regards to memory and recall. The
hypothesis investigated here is whether or not iconic and beat gestures
provide assistance in memory and recall in Greek and in Greek
speakers’ second language. Two groups of participants were formed,
one comprising Greeks that reside in Athens and one with Greeks
that reside in Copenhagen. Three kinds of stimuli were used: A video
with words accompanied with iconic gestures, a video with words
accompanied with beat gestures and a video with words alone. The
languages used are Greek and English. The words in the English
videos were spoken by a native English speaker and by a Greek
speaker talking English. The reason for this is that when it comes to
beat gestures that serve a meta-cognitive function and are generated
according to the intonation of a language, prosody plays a major
role. Thus, participants that have different influences in prosody may
generate different results from rhythmic gestures. Memory recall was
assessed by asking the participants to try to remember as many
words as they could after viewing each video. Results show that
iconic gestures provide significant assistance in memory and recall
in Greek and in English whether they are produced by a native or
a second language speaker. In the case of beat gestures though, the
findings indicate that beat gestures may not play such a significant
role in Greek language. As far as intonation is concerned, a significant
difference was not found in the case of beat gestures produced by a
native English speaker and by a Greek speaker talking English.
Abstract: Blood cell analysis plays a significant role in the diagnosis of human health. As an alternative to the traditional technique conducted by laboratory technicians, this paper presents an automatic white blood cell (leukocyte) detection system using Image Stitching and Color Overlapping Windows. The advantage of this method is to present a detection technique of white blood cells that are robust to imperfect shapes of blood cells with various image qualities. The input for this application is images from a microscope-slide translation video. The preprocessing stage is performed by stitching the input images. First, the overlapping parts of the images are determined, then stitching and blending processes of two input images are performed. Next, the Color Overlapping Windows is performed for white blood cell detection which consists of color filtering, window candidate checking, window marking, finds window overlaps, and window cropping processes. Experimental results show that this method could achieve an average of 82.12% detection accuracy of the leukocyte images.
Abstract: In this paper, a design of H.263 based wireless video
transceiver is presented for wireless camera system. It uses standard
WIFI transceiver and the covering area is up to 100m. Furthermore the
standard H.263 video encoding technique is used for video
compression since wireless video transmitter is unable to transmit high
capacity raw data in real time and the implemented system is capable
of streaming at speed of less than 1Mbps using NTSC 720x480 video.
Abstract: This paper presents the design and implementation
details of a complete unmanned aerial system (UAS) based
on commercial-off-the-shelf (COTS) components, focusing on
safety, security, search and rescue scenarios in GPS-denied
environments. In particular, The aerial platform is capable
of semi-autonomously navigating through extremely low-light,
GPS-denied indoor environments based on onboard sensors only,
including a downward-facing optical flow camera. Besides, an
additional low-cost payload camera system is developed to stream
both infra-red video and visible light video to a ground station in
real-time, for the purpose of detecting sign of life and hidden humans.
The total cost of the complete system is estimated to be $1150,
and the effectiveness of the system has been tested and validated
in practical scenarios.
Abstract: Background subtraction and temporal difference are
often used for moving object detection in video. Both approaches are
computationally simple and easy to be deployed in real-time image
processing. However, while the background subtraction is highly
sensitive to dynamic background and illumination changes, the
temporal difference approach is poor at extracting relevant pixels of
the moving object and at detecting the stopped or slowly moving
objects in the scene. In this paper, we propose a simple moving object
detection scheme based on adaptive background subtraction and
temporal difference exploiting dynamic background updates. The
proposed technique consists of histogram equalization, a linear
combination of background and temporal difference, followed by the
novel frame-based and pixel-based background updating techniques.
Finally, morphological operations are applied to the output images.
Experimental results show that the proposed algorithm can solve the
drawbacks of both background subtraction and temporal difference
methods and can provide better performance than that of each method.
Abstract: Key frame extraction methods select the most
representative frames of a video, which can be used in different areas
of video processing such as video retrieval, video summary, and video
indexing. In this paper we present a novel approach for extracting key
frames from video sequences. The frame is characterized uniquely by
his contours which are represented by the dominant blocks. These
dominant blocks are located on the contours and its near textures.
When the video frames have a noticeable changement, its dominant
blocks changed, then we can extracte a key frame. The dominant
blocks of every frame is computed, and then feature vectors are
extracted from the dominant blocks image of each frame and arranged
in a feature matrix. Singular Value Decomposition is used to calculate
sliding windows ranks of those matrices. Finally the computed ranks
are traced and then we are able to extract key frames of a video.
Experimental results show that the proposed approach is robust
against a large range of digital effects used during shot transition.
Abstract: The proliferation of multimedia technology and services in today’s world provide ample research scope in the frontiers of visual signal processing. Wide spread usage of video based applications in heterogeneous environment needs viable methods of Video Quality Assessment (VQA). The evaluation of video quality not only depends on high QoS requirements but also emphasis the need of novel term ‘QoE’ (Quality of Experience) that perceive video quality as user centric. This paper discusses two vital video quality assessment methods namely, subjective and objective assessment methods. The evolution of various video quality metrics, their classification models and applications are reviewed in this work. The Mean Opinion Score (MOS) based subjective measurements and algorithm based objective metrics are discussed and their challenges are outlined. Further, this paper explores the recent progress of VQA in emerging technologies such as mobile video and 3D video.
Abstract: This paper presents results of primary quantitative research on viral advertising with focus on popularity and willingness to share viral video among Czech Internet population. It starts with brief theoretical debate on viral advertising, which is used for the comparison of the results. For purpose of collecting data, online questionnaire survey was given to 384 respondents. Statistics utilized in this research included frequency, percentage, correlation and Pearson’s Chi-square test. Data was evaluated using SPSS software. The research analysis disclosed high popularity of viral advertising video among Czech Internet population but implies lower willingness to share it. Significant relationship between likability of viral video technique and age of the viewer was found.
Abstract: At this paper, we will present the development of mobile application Social Guidance and Counseling (GC) that called “m-NingBK: Social GC”. The application is used for GC services that run on mobile devices. The application is designed specifically for Junior High School student. The methods are a combination of interactive multimedia approaches and educational psychology. Therefore, the design process is carried out three processes, which are digitizing of material social GC services, visualizing wisely and making interactive. This method is intended to make students not only hear and see but also "do" the virtual. There are five components used in multimedia applications "m-NingBK: Social GC" i.e. text, images / graphics, audio / sound, animation and video. Four menus provided by this application is the potential self, social, Expert System and about. The application is built using the Java programming language. This application was tested using a Smartphone with Android Operating System. Based on the test, people give rating: 16.7% excellent, 61.1% good, 19.4% adequate, and 2.8% poor.
Abstract: Now a days video data embedding approach is a very challenging and interesting task towards keeping real time video data secure. We can implement and use this technique with high-level applications. As the rate-distortion of any image is not confirmed, because the gain provided by accurate image frame segmentation are balanced by the inefficiency of coding objects of arbitrary shape, with a lot factors like losses that depend on both the coding scheme and the object structure. By using rate controller in association with the encoder one can dynamically adjust the target bitrate. This paper discusses about to keep secure videos by mixing signature data with negligible distortion in the original video, and to keep steganographic video as closely as possible to the quality of the original video. In this discussion we propose the method for embedding the signature data into separate video frames by the use of block Discrete Cosine Transform. These frames are then encoded by real time encoding H.264 scheme concepts. After processing, at receiver end recovery of original video and the signature data is proposed.
Abstract: We present a hardware oriented method for real-time
measurements of object-s position in video. The targeted application
area is light spots used as references for robotic navigation. Different
algorithms for dynamic thresholding are explored in combination
with component labeling and Center Of Gravity (COG) for highest
possible precision versus Signal-to-Noise Ratio (SNR). This method
was developed with a low hardware cost in focus having only one
convolution operation required for preprocessing of data.
Abstract: This paper presents an evaluation for a wavelet-based
digital watermarking technique used in estimating the quality of
video sequences transmitted over Additive White Gaussian Noise
(AWGN) channel in terms of a classical objective metric, such as
Peak Signal-to-Noise Ratio (PSNR) without the need of the original
video. In this method, a watermark is embedded into the Discrete
Wavelet Transform (DWT) domain of the original video frames
using a quantization method. The degradation of the extracted
watermark can be used to estimate the video quality in terms of
PSNR with good accuracy. We calculated PSNR for video frames
contaminated with AWGN and compared the values with those
estimated using the Watermarking-DWT based approach. It is found
that the calculated and estimated quality measures of the video
frames are highly correlated, suggesting that this method can provide
a good quality measure for video frames transmitted over AWGN
channel without the need of the original video.
Abstract: Packet switched data network like Internet, which has
traditionally supported throughput sensitive applications such as email
and file transfer, is increasingly supporting delay-sensitive
multimedia applications such as interactive video. These delaysensitive
applications would often rather sacrifice some throughput
for better delay. Unfortunately, the current packet switched network
does not offer choices, but instead provides monolithic best-effort
service to all applications. This paper evaluates Class Based Queuing
(CBQ), Coordinated Earliest Deadline First (CEDF), Weighted
Switch Deficit Round Robin (WSDRR) and RED-Boston scheduling
schemes that is sensitive to delay bound expectations for variety of
real time applications and an enhancement of WSDRR is proposed.
Abstract: Camera calibration plays an important role in the domain of the analysis of sports video. Considering soccer video, in most cases, the cross-points can be used for calibration at the center of the soccer field are not sufficient, so this paper introduces a new automatic camera calibration algorithm focus on solving this problem by using the properties of images of the center circle, halfway line and a touch line. After the theoretical analysis, a practicable automatic algorithm is proposed. Very little information used though, results of experiments with both synthetic data and real data show that the algorithm is applicable.