Abstract: This paper proposes a novel approach that combines statistical models and support vector machines. A hybrid scheme which appropriately incorporates the advantages of both the generative and discriminant model paradigms is described and evaluated. Support vector machines (SVMs) are trained to divide the whole speakers' space into small subsets of speakers within a hierarchical tree structure. During testing a speech token is assigned to its corresponding group and evaluation using gaussian mixture models (GMMs) is then processed. Experimental results show that the proposed method can significantly improve the performance of text independent speaker identification task. We report improvements of up to 50% reduction in identification error rate compared to the baseline statistical model.
Abstract: Recently many research has been conducted to
retrieve pertinent parameters and adequate models for automatic
music genre classification. In this paper, two measures based upon
information theory concepts are investigated for mapping the features
space to decision space. A Gaussian Mixture Model (GMM) is used
as a baseline and reference system. Various strategies are proposed
for training and testing sessions with matched or mismatched
conditions, long training and long testing, long training and short
testing. For all experiments, the file sections used for testing are
never been used during training. With matched conditions all
examined measures yield the best and similar scores (almost 100%).
With mismatched conditions, the proposed measures yield better
scores than the GMM baseline system, especially for the short testing
case. It is also observed that the average discrimination information
measure is most appropriate for music category classifications and on
the other hand the divergence measure is more suitable for music
subcategory classifications.
Abstract: Image clustering is a process of grouping images
based on their similarity. The image clustering usually uses the color
component, texture, edge, shape, or mixture of two components, etc.
This research aims to explore image clustering using color
composition. In order to complete this image clustering, three main
components should be considered, which are color space, image
representation (feature extraction), and clustering method itself. We
aim to explore which composition of these factors will produce the
best clustering results by combining various techniques from the
three components. The color spaces use RGB, HSV, and L*a*b*
method. The image representations use Histogram and Gaussian
Mixture Model (GMM), whereas the clustering methods use KMeans
and Agglomerative Hierarchical Clustering algorithm. The
results of the experiment show that GMM representation is better
combined with RGB and L*a*b* color space, whereas Histogram is
better combined with HSV. The experiments also show that K-Means
is better than Agglomerative Hierarchical for images clustering.
Abstract: Mixed-traffic (e.g., pedestrians, bicycles, and vehicles)
data at an intersection is one of the essential factors for intersection
design and traffic control. However, some data such as pedestrian
volume cannot be directly collected by common detectors (e.g.
inductive loop, sonar and microwave sensors). In this paper, a video
based detection algorithm is proposed for mixed-traffic data collection
at intersections using surveillance cameras. The algorithm is derived
from Gaussian Mixture Model (GMM), and uses a mergence time
adjustment scheme to improve the traditional algorithm. Real-world
video data were selected to test the algorithm. The results show that
the proposed algorithm has the faster processing speed and more
accuracy than the traditional algorithm. This indicates that the
improved algorithm can be applied to detect mixed-traffic at
signalized intersection, even when conflicts occur.
Abstract: In many applications, it is a priori known that the
target function should satisfy certain constraints imposed by, for
example, economic theory or a human-decision maker. Here we
consider partially monotone problems, where the target variable
depends monotonically on some of the predictor variables but not all.
We propose an approach to build partially monotone models based
on the convolution of monotone neural networks and kernel
functions. The results from simulations and a real case study on
house pricing show that our approach has significantly better
performance than partially monotone linear models. Furthermore, the
incorporation of partial monotonicity constraints not only leads to
models that are in accordance with the decision maker's expertise,
but also reduces considerably the model variance in comparison to
standard neural networks with weight decay.
Abstract: An algorithm for learning an overcomplete dictionary
using a Cauchy mixture model for sparse decomposition of an underdetermined
mixing system is introduced. The mixture density
function is derived from a ratio sample of the observed mixture
signals where 1) there are at least two but not necessarily more
mixture signals observed, 2) the source signals are statistically
independent and 3) the sources are sparse. The basis vectors of the
dictionary are learned via the optimization of the location parameters
of the Cauchy mixture components, which is shown to be more
accurate and robust than the conventional data mining methods
usually employed for this task. Using a well known sparse
decomposition algorithm, we extract three speech signals from two
mixtures based on the estimated dictionary. Further tests with
additive Gaussian noise are used to demonstrate the proposed
algorithm-s robustness to outliers.
Abstract: The paper presents a method for multivariate time
series forecasting using Independent Component Analysis (ICA), as a preprocessing tool. The idea of this approach is to do the forecasting in the space of independent components (sources), and then to transform back the results to the original time series
space. The forecasting can be done separately and with a different
method for each component, depending on its time structure. The
paper gives also a review of the main algorithms for independent component analysis in the case of instantaneous mixture models, using second and high-order statistics. The method has been applied in simulation to an artificial multivariate time series
with five components, generated from three sources and a mixing matrix, randomly generated.
Abstract: In this paper, an algorithm for detecting and attenuating
puff noises frequently generated under the mobile environment is
proposed. As a baseline system, puff detection system is designed
based on Gaussian Mixture Model (GMM), and 39th Mel Frequency
Cepstral Coefficient (MFCC) is extracted as feature parameters. To
improve the detection performance, effective acoustic features for puff
detection are proposed. In addition, detected puff intervals are
attenuated by high-pass filtering. The speech recognition rate was
measured for evaluation and confusion matrix and ROC curve are used
to confirm the validity of the proposed system.
Abstract: Gaussian mixture background model is widely used in
moving target detection of the image sequences. However, traditional
Gaussian mixture background model usually considers the time
continuity of the pixels, and establishes background through statistical
distribution of pixels without taking into account the pixels- spatial
similarity, which will cause noise, imperfection and other problems.
This paper proposes a new Gaussian mixture modeling approach,
which combines the color and gradient of the spatial information, and
integrates the spatial information of the pixel sequences to establish
Gaussian mixture background. The experimental results show that the
movement background can be extracted accurately and efficiently, and
the algorithm is more robust, and can work in real time in tracking
applications.
Abstract: By taking advantage of both k-NN which is highly
accurate and K-means cluster which is able to reduce the time of classification, we can introduce Cluster-k-Nearest Neighbor as "variable k"-NN dealing with the centroid or mean point of all subclasses generated by clustering algorithm. In general the algorithm of K-means cluster is not stable, in term of accuracy, for that reason we develop another algorithm for clustering our space which gives a higher accuracy than K-means cluster, less
subclass number, stability and bounded time of classification with respect to the variable data size. We find between 96% and 99.7 % of accuracy in the lassification of 6 different types of Time series by using K-means cluster algorithm and we find 99.7% by using the new clustering algorithm.
Abstract: In this research, a latent class vector model for pairwise data is formulated. As compared to the basic vector model, this model yields consistent estimates of the parameters since the number of parameters to be estimated does not increase with the number of subjects. The result of the analysis reveals that the model was stable and could classify each subject to the latent classes representing the typical scales used by these subjects.
Abstract: An adaptive spatial Gaussian mixture model is proposed for clustering based color image segmentation. A new clustering objective function which incorporates the spatial information is introduced in the Bayesian framework. The weighting parameter for controlling the importance of spatial information is made adaptive to the image content to augment the smoothness towards piecewisehomogeneous region and diminish the edge-blurring effect and hence the name adaptive spatial finite mixture model. The proposed approach is compared with the spatially variant finite mixture model for pixel labeling. The experimental results with synthetic and Berkeley dataset demonstrate that the proposed method is effective in improving the segmentation and it can be employed in different practical image content understanding applications.
Abstract: Nevertheless the widespread application of finite
mixture models in segmentation, finite mixture model selection is
still an important issue. In fact, the selection of an adequate number
of segments is a key issue in deriving latent segments structures and
it is desirable that the selection criteria used for this end are effective.
In order to select among several information criteria, which may
support the selection of the correct number of segments we conduct a
simulation study. In particular, this study is intended to determine
which information criteria are more appropriate for mixture model
selection when considering data sets with only categorical
segmentation base variables. The generation of mixtures of
multinomial data supports the proposed analysis. As a result, we
establish a relationship between the level of measurement of
segmentation variables and some (eleven) information criteria-s
performance. The criterion AIC3 shows better performance (it
indicates the correct number of the simulated segments- structure
more often) when referring to mixtures of multinomial segmentation
base variables.
Abstract: In general, image-based 3D scenes can now be found in many popular vision systems, computer games and virtual reality tours. So, It is important to segment ROI (region of interest) from input scenes as a preprocessing step for geometric stricture detection in 3D scene. In this paper, we propose a method for segmenting ROI based on tensor voting and Dirichlet process mixture model. In particular, to estimate geometric structure information for 3D scene from a single outdoor image, we apply the tensor voting and Dirichlet process mixture model to a image segmentation. The tensor voting is used based on the fact that homogeneous region in an image are usually close together on a smooth region and therefore the tokens corresponding to centers of these regions have high saliency values. The proposed approach is a novel nonparametric Bayesian segmentation method using Gaussian Dirichlet process mixture model to automatically segment various natural scenes. Finally, our method can label regions of the input image into coarse categories: “ground", “sky", and “vertical" for 3D application. The experimental results show that our method successfully segments coarse regions in many complex natural scene images for 3D.
Abstract: In this paper, we present the region based hidden Markov random field model (RBHMRF), which encodes the characteristics of different brain regions into a probabilistic framework for brain MR image segmentation. The recently proposed TV+L1 model is used for region extraction. By utilizing different spatial characteristics in different brain regions, the RMHMRF model performs beyond the current state-of-the-art method, the hidden Markov random field model (HMRF), which uses identical spatial information throughout the whole brain. Experiments on both real and synthetic 3D MR images show that the segmentation result of the proposed method has higher accuracy compared to existing algorithms.
Abstract: Distant-talking voice-based HCI system suffers from
performance degradation due to mismatch between the acoustic
speech (runtime) and the acoustic model (training). Mismatch is
caused by the change in the power of the speech signal as observed at
the microphones. This change is greatly influenced by the change in
distance, affecting speech dynamics inside the room before reaching
the microphones. Moreover, as the speech signal is reflected, its
acoustical characteristic is also altered by the room properties. In
general, power mismatch due to distance is a complex problem. This
paper presents a novel approach in dealing with distance-induced
mismatch by intelligently sensing instantaneous voice power variation
and compensating model parameters. First, the distant-talking speech
signal is processed through microphone array processing, and the
corresponding distance information is extracted. Distance-sensitive
Gaussian Mixture Models (GMMs), pre-trained to capture both
speech power and room property are used to predict the optimal
distance of the speech source. Consequently, pre-computed statistic
priors corresponding to the optimal distance is selected to correct
the statistics of the generic model which was frozen during training.
Thus, model combinatorics are post-conditioned to match the power
of instantaneous speech acoustics at runtime. This results to an
improved likelihood in predicting the correct speech command at
farther distances. We experiment using real data recorded inside two
rooms. Experimental evaluation shows voice recognition performance
using our method is more robust to the change in distance compared
to the conventional approach. In our experiment, under the most
acoustically challenging environment (i.e., Room 2: 2.5 meters), our
method achieved 24.2% improvement in recognition performance
against the best-performing conventional method.
Abstract: In the present study, the pressure drop and laminar convection heat transfer characteristics of nanofluids in microchannel heat sink with square duct are numerically investigated. The water based nanofluids created with Al2O3 and CuO particles in four different volume fractions of 0%, 0.5%, 1%, 1.5% and 2% are used to analyze their effects on heat transfer and the pressure drop. Under the laminar, steady-state flow conditions, the finite volume method is used to solve the governing equations of heat transfer. Mixture Model is considered to simulate the nanofluid flow. For verification of used numerical method, the results obtained from numerical calculations were compared with the results in literature for both pure water and the nanofluids in different volume fractions. The distributions of the particles in base fluid are assumed to be uniform. The results are evaluated in terms of Nusselt number, the pressure drop and heat transfer enhancement. Analysis shows that the nanofluids enhance heat transfer while the Reynolds number and the volume fractions are increasing. The best overall enhancement was obtained at φ=%2 and Re=100 for CuO-water nanofluid.
Abstract: An unsupervised classification algorithm is derived
by modeling observed data as a mixture of several mutually
exclusive classes that are each described by linear combinations of
independent non-Gaussian densities. The algorithm estimates the
data density in each class by using parametric nonlinear functions
that fit to the non-Gaussian structure of the data. This improves
classification accuracy compared with standard Gaussian mixture
models. When applied to textures, the algorithm can learn basis
functions for images that capture the statistically significant structure
intrinsic in the images. We apply this technique to the problem of
unsupervised texture classification and segmentation.
Abstract: Skin color based tracking techniques often assume a
static skin color model obtained either from an offline set of library
images or the first few frames of a video stream. These models
can show a weak performance in presence of changing lighting or
imaging conditions. We propose an adaptive skin color model based
on the Gaussian mixture model to handle the changing conditions.
Initial estimation of the number and weights of skin color clusters
are obtained using a modified form of the general Expectation
maximization algorithm, The model adapts to changes in imaging
conditions and refines the model parameters dynamically using spatial
and temporal constraints. Experimental results show that the method
can be used in effectively tracking of hand and face regions.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.