Abstract: Tracking of moving people has gained a matter of great importance due to rapid technological advancements in the field of computer vision. The objective of this study is to design a motion based detection and tracking multiple walking pedestrians randomly in different directions. In our proposed method, Gaussian mixture model (GMM) is used to determine moving persons in image sequences. It reacts to changes that take place in the scene like different illumination; moving objects start and stop often, etc. Background noise in the scene is eliminated through applying morphological operations and the motions of tracked people which is determined by using the Kalman filter. The Kalman filter is applied to predict the tracked location in each frame and to determine the likelihood of each detection. We used a benchmark data set for the evaluation based on a side wall stationary camera. The actual scenes from the data set are taken on a street including up to eight people in front of the camera in different two scenes, the duration is 53 and 35 seconds, respectively. In the case of walking pedestrians in close proximity, the proposed method has achieved the detection ratio of 87%, and the tracking ratio is 77 % successfully. When they are deferred from each other, the detection ratio is increased to 90% and the tracking ratio is also increased to 79%.
Abstract: The motivation of our work is to detect different
terrain types traversed by a robot based on acoustic data from the
robot-terrain interaction. Different acoustic features and classifiers
were investigated, such as Mel-frequency cepstral coefficient and
Gamma-tone frequency cepstral coefficient for the feature extraction,
and Gaussian mixture model and Feed forward neural network for the
classification. We analyze the system’s performance by comparing
our proposed techniques with some other features surveyed from
distinct related works. We achieve precision and recall values between
87% and 100% per class, and an average accuracy at 95.2%. We also
study the effect of varying audio chunk size in the application phase
of the models and find only a mild impact on performance.
Abstract: As 3D video is explored as a hot research topic in the last few decades, free-viewpoint TV (FTV) is no doubt a promising field for its better visual experience and incomparable interactivity. View synthesis is obviously a crucial technology for FTV; it enables to render images in unlimited numbers of virtual viewpoints with the information from limited numbers of reference view. In this paper, a novel hybrid synthesis framework is proposed and blending priority is explored. In contrast to the commonly used View Synthesis Reference Software (VSRS), the presented synthesis process is driven in consideration of the temporal correlation of image sequences. The temporal correlations will be exploited to produce fine synthesis results even near the foreground boundaries. As for the blending priority, this scheme proposed that one of the two reference views is selected to be the main reference view based on the distance between the reference views and virtual view, another view is chosen as the auxiliary viewpoint, just assist to fill the hole pixel with the help of background information. Significant improvement of the proposed approach over the state-of –the-art pixel-based virtual view synthesis method is presented, the results of the experiments show that subjective gains can be observed, and objective PSNR average gains range from 0.5 to 1.3 dB, while SSIM average gains range from 0.01 to 0.05.
Abstract: This paper presents a self-sustaining mobile system for
counting and classification of vehicles through processing video. It
proposes a counting and classification algorithm divided in four steps
that can be executed multiple times in parallel in a SBC (Single
Board Computer), like the Raspberry Pi 2, in such a way that it
can be implemented in real time. The first step of the proposed
algorithm limits the zone of the image that it will be processed.
The second step performs the detection of the mobile objects using
a BGS (Background Subtraction) algorithm based on the GMM
(Gaussian Mixture Model), as well as a shadow removal algorithm
using physical-based features, followed by morphological operations.
In the first step the vehicle detection will be performed by using
edge detection algorithms and the vehicle following through Kalman
filters. The last step of the proposed algorithm registers the vehicle
passing and performs their classification according to their areas.
An auto-sustainable system is proposed, powered by batteries and
photovoltaic solar panels, and the data transmission is done through
GPRS (General Packet Radio Service)eliminating the need of using
external cable, which will facilitate it deployment and translation to
any location where it could operate. The self-sustaining trailer will
allow the counting and classification of vehicles in specific zones
with difficult access.
Abstract: Real Time Video Tracking is a challenging task for computing professionals. The performance of video tracking techniques is greatly affected by background detection and elimination process. Local regions of the image frame contain vital information of background and foreground. However, pixel-level processing of local regions consumes a good amount of computational time and memory space by traditional approaches. In our approach we have explored the concurrent computational ability of General Purpose Graphic Processing Units (GPGPU) to address this problem. The Gaussian Mixture Model (GMM) with adaptive weighted kernels is used for detecting the background. The weights of the kernel are influenced by local regions and are updated by inter-frame variations of these corresponding regions. The proposed system has been tested with GPU devices such as GeForce GTX 280, GeForce GTX 280 and Quadro K2000. The results are encouraging with maximum speed up 10X compared to sequential approach.
Abstract: Speaker Identification (SI) is the task of establishing
identity of an individual based on his/her voice characteristics. The SI
task is typically achieved by two-stage signal processing: training and
testing. The training process calculates speaker specific feature
parameters from the speech and generates speaker models
accordingly. In the testing phase, speech samples from unknown
speakers are compared with the models and classified. Even though
performance of speaker identification systems has improved due to
recent advances in speech processing techniques, there is still need of
improvement. In this paper, a Closed-Set Tex-Independent Speaker
Identification System (CISI) based on a Multiple Classifier System
(MCS) is proposed, using Mel Frequency Cepstrum Coefficient
(MFCC) as feature extraction and suitable combination of vector
quantization (VQ) and Gaussian Mixture Model (GMM) together
with Expectation Maximization algorithm (EM) for speaker
modeling. The use of Voice Activity Detector (VAD) with a hybrid
approach based on Short Time Energy (STE) and Statistical
Modeling of Background Noise in the pre-processing step of the
feature extraction yields a better and more robust automatic speaker
identification system. Also investigation of Linde-Buzo-Gray (LBG)
clustering algorithm for initialization of GMM, for estimating the
underlying parameters, in the EM step improved the convergence rate
and systems performance. It also uses relative index as confidence
measures in case of contradiction in identification process by GMM
and VQ as well. Simulation results carried out on voxforge.org
speech database using MATLAB highlight the efficacy of the
proposed method compared to earlier work.
Abstract: In this paper, we propose moving object detection
method which is helpful for driver to safely take his/her car out of
parking lot. When moving objects such as motorbikes, pedestrians,
the other cars and some obstacles are detected at the rear-side of host
vehicle, the proposed algorithm can provide to driver warning. We
assume that the host vehicle is just before departure. Gaussian
Mixture Model (GMM) based background subtraction is basically
applied. Pre-processing such as smoothing and post-processing as
morphological filtering are added. We examine “which color space
has better performance for detection of moving objects?” Three color
spaces including RGB, YCbCr, and Y are applied and compared, in
terms of detection rate. Through simulation, we prove that RGB
space is more suitable for moving object detection based on
background subtraction.
Abstract: This paper presents two techniques, local feature
extraction using image spectrum and low frequency spectrum
modelling using GMM to capture the underlying statistical
information to improve the performance of face recognition
system. Local spectrum features are extracted using overlap sub
block window that are mapped on the face image. For each of this
block, spatial domain is transformed to frequency domain using
DFT. A low frequency coefficient is preserved by discarding high
frequency coefficients by applying rectangular mask on the
spectrum of the facial image. Low frequency information is non-
Gaussian in the feature space and by using combination of several
Gaussian functions that has different statistical properties, the best
feature representation can be modelled using probability density
function. The recognition process is performed using maximum
likelihood value computed using pre-calculated GMM components.
The method is tested using FERET datasets and is able to achieved
92% recognition rates.
Abstract: Over the past few years, the online multimedia
collection has grown at a fast pace. Several companies showed
interest to study the different ways to organise the amount of audio
information without the need of human intervention to generate
metadata. In the past few years, many applications have emerged on
the market which are capable of identifying a piece of music in a
short time. Different audio effects and degradation make it much
harder to identify the unknown piece. In this paper, an audio
fingerprinting system which makes use of a non-parametric based
algorithm is presented. Parametric analysis is also performed using
Gaussian Mixture Models (GMMs). The feature extraction methods
employed are the Mel Spectrum Coefficients and the MPEG-7 basic
descriptors. Bin numbers replaced the extracted feature coefficients
during the non-parametric modelling. The results show that nonparametric
analysis offer potential results as the ones mentioned in
the literature.
Abstract: The 3D body movement signals captured during
human-human conversation include clues not only to the content of
people’s communication but also to their culture and personality.
This paper is concerned with automatic extraction of this information
from body movement signals. For the purpose of this research, we
collected a novel corpus from 27 subjects, arranged them into groups
according to their culture. We arranged each group into pairs and
each pair communicated with each other about different topics.
A state-of-art recognition system is applied to the problems of
person, culture, and topic recognition. We borrowed modeling,
classification, and normalization techniques from speech recognition.
We used Gaussian Mixture Modeling (GMM) as the main technique
for building our three systems, obtaining 77.78%, 55.47%, and
39.06% from the person, culture, and topic recognition systems
respectively. In addition, we combined the above GMM systems with
Support Vector Machines (SVM) to obtain 85.42%, 62.50%, and
40.63% accuracy for person, culture, and topic recognition
respectively.
Although direct comparison among these three recognition
systems is difficult, it seems that our person recognition system
performs best for both GMM and GMM-SVM, suggesting that intersubject
differences (i.e. subject’s personality traits) are a major
source of variation. When removing these traits from culture and
topic recognition systems using the Nuisance Attribute Projection
(NAP) and the Intersession Variability Compensation (ISVC)
techniques, we obtained 73.44% and 46.09% accuracy from culture
and topic recognition systems respectively.
Abstract: In this paper, Fuzzy C-Means clustering with
Expectation Maximization-Gaussian Mixture Model based hybrid
modeling algorithm is proposed for Continuous Tamil Speech
Recognition. The speech sentences from various speakers are used
for training and testing phase and objective measures are between the
proposed and existing Continuous Speech Recognition algorithms.
From the simulated results, it is observed that the proposed algorithm
improves the recognition accuracy and F-measure up to 3% as
compared to that of the existing algorithms for the speech signal from
various speakers. In addition, it reduces the Word Error Rate, Error
Rate and Error up to 4% as compared to that of the existing
algorithms. In all aspects, the proposed hybrid modeling for Tamil
speech recognition provides the significant improvements for speechto-
text conversion in various applications.
Abstract: This paper proposes a hierarchical hidden Markov model (HHMM) to model the detection of M vehicles in a wireless sensor network (WSN). The HHMM model contains an extra level of hidden Markov model to model the temporal transitions of each
state of the first HMM. By modeling the temporal transitions, only those hypothesis with nonzero transition probabilities needs to be tested. Thus, this method efficiently reduces the computation load, which is preferable in WSN applications.This paper integrates several techniques to optimize the detection performance. The output of the states of the first HMM is modeled as Gaussian Mixture Model (GMM), where the number of states and the number of Gaussians are experimentally determined, while the other parameters are estimated using Expectation Maximization (EM). HHMM is used to model the sequence of the local decisions which are based on multiple hypothesis testing with maximum likelihood approach. The states in the HHMM represent various combinations of vehicles of different types. Due to the statistical advantages of multisensor data fusion, we propose a heuristic based on fuzzy weighted majority voting to enhance cooperative classification of moving vehicles within a region that is monitored by a wireless sensor network. A fuzzy inference system weighs each local decision based on the signal to noise
ratio of the acoustic signal for target detection and the signal to noise ratio of the radio signal for sensor communication. The spatial correlation among the observations of neighboring sensor nodes is efficiently utilized as well as the temporal correlation. Simulation results demonstrate the efficiency of this scheme.
Abstract: This paper proposes a novel approach that combines statistical models and support vector machines. A hybrid scheme which appropriately incorporates the advantages of both the generative and discriminant model paradigms is described and evaluated. Support vector machines (SVMs) are trained to divide the whole speakers' space into small subsets of speakers within a hierarchical tree structure. During testing a speech token is assigned to its corresponding group and evaluation using gaussian mixture models (GMMs) is then processed. Experimental results show that the proposed method can significantly improve the performance of text independent speaker identification task. We report improvements of up to 50% reduction in identification error rate compared to the baseline statistical model.
Abstract: Recently many research has been conducted to
retrieve pertinent parameters and adequate models for automatic
music genre classification. In this paper, two measures based upon
information theory concepts are investigated for mapping the features
space to decision space. A Gaussian Mixture Model (GMM) is used
as a baseline and reference system. Various strategies are proposed
for training and testing sessions with matched or mismatched
conditions, long training and long testing, long training and short
testing. For all experiments, the file sections used for testing are
never been used during training. With matched conditions all
examined measures yield the best and similar scores (almost 100%).
With mismatched conditions, the proposed measures yield better
scores than the GMM baseline system, especially for the short testing
case. It is also observed that the average discrimination information
measure is most appropriate for music category classifications and on
the other hand the divergence measure is more suitable for music
subcategory classifications.
Abstract: Image clustering is a process of grouping images
based on their similarity. The image clustering usually uses the color
component, texture, edge, shape, or mixture of two components, etc.
This research aims to explore image clustering using color
composition. In order to complete this image clustering, three main
components should be considered, which are color space, image
representation (feature extraction), and clustering method itself. We
aim to explore which composition of these factors will produce the
best clustering results by combining various techniques from the
three components. The color spaces use RGB, HSV, and L*a*b*
method. The image representations use Histogram and Gaussian
Mixture Model (GMM), whereas the clustering methods use KMeans
and Agglomerative Hierarchical Clustering algorithm. The
results of the experiment show that GMM representation is better
combined with RGB and L*a*b* color space, whereas Histogram is
better combined with HSV. The experiments also show that K-Means
is better than Agglomerative Hierarchical for images clustering.
Abstract: Mixed-traffic (e.g., pedestrians, bicycles, and vehicles)
data at an intersection is one of the essential factors for intersection
design and traffic control. However, some data such as pedestrian
volume cannot be directly collected by common detectors (e.g.
inductive loop, sonar and microwave sensors). In this paper, a video
based detection algorithm is proposed for mixed-traffic data collection
at intersections using surveillance cameras. The algorithm is derived
from Gaussian Mixture Model (GMM), and uses a mergence time
adjustment scheme to improve the traditional algorithm. Real-world
video data were selected to test the algorithm. The results show that
the proposed algorithm has the faster processing speed and more
accuracy than the traditional algorithm. This indicates that the
improved algorithm can be applied to detect mixed-traffic at
signalized intersection, even when conflicts occur.
Abstract: In this paper, an algorithm for detecting and attenuating
puff noises frequently generated under the mobile environment is
proposed. As a baseline system, puff detection system is designed
based on Gaussian Mixture Model (GMM), and 39th Mel Frequency
Cepstral Coefficient (MFCC) is extracted as feature parameters. To
improve the detection performance, effective acoustic features for puff
detection are proposed. In addition, detected puff intervals are
attenuated by high-pass filtering. The speech recognition rate was
measured for evaluation and confusion matrix and ROC curve are used
to confirm the validity of the proposed system.
Abstract: Gaussian mixture background model is widely used in
moving target detection of the image sequences. However, traditional
Gaussian mixture background model usually considers the time
continuity of the pixels, and establishes background through statistical
distribution of pixels without taking into account the pixels- spatial
similarity, which will cause noise, imperfection and other problems.
This paper proposes a new Gaussian mixture modeling approach,
which combines the color and gradient of the spatial information, and
integrates the spatial information of the pixel sequences to establish
Gaussian mixture background. The experimental results show that the
movement background can be extracted accurately and efficiently, and
the algorithm is more robust, and can work in real time in tracking
applications.
Abstract: By taking advantage of both k-NN which is highly
accurate and K-means cluster which is able to reduce the time of classification, we can introduce Cluster-k-Nearest Neighbor as "variable k"-NN dealing with the centroid or mean point of all subclasses generated by clustering algorithm. In general the algorithm of K-means cluster is not stable, in term of accuracy, for that reason we develop another algorithm for clustering our space which gives a higher accuracy than K-means cluster, less
subclass number, stability and bounded time of classification with respect to the variable data size. We find between 96% and 99.7 % of accuracy in the lassification of 6 different types of Time series by using K-means cluster algorithm and we find 99.7% by using the new clustering algorithm.
Abstract: An adaptive spatial Gaussian mixture model is proposed for clustering based color image segmentation. A new clustering objective function which incorporates the spatial information is introduced in the Bayesian framework. The weighting parameter for controlling the importance of spatial information is made adaptive to the image content to augment the smoothness towards piecewisehomogeneous region and diminish the edge-blurring effect and hence the name adaptive spatial finite mixture model. The proposed approach is compared with the spatially variant finite mixture model for pixel labeling. The experimental results with synthetic and Berkeley dataset demonstrate that the proposed method is effective in improving the segmentation and it can be employed in different practical image content understanding applications.