Abstract: This paper proposes a novel approach that combines statistical models and support vector machines. A hybrid scheme which appropriately incorporates the advantages of both the generative and discriminant model paradigms is described and evaluated. Support vector machines (SVMs) are trained to divide the whole speakers' space into small subsets of speakers within a hierarchical tree structure. During testing a speech token is assigned to its corresponding group and evaluation using gaussian mixture models (GMMs) is then processed. Experimental results show that the proposed method can significantly improve the performance of text independent speaker identification task. We report improvements of up to 50% reduction in identification error rate compared to the baseline statistical model.
Abstract: Recently many research has been conducted to
retrieve pertinent parameters and adequate models for automatic
music genre classification. In this paper, two measures based upon
information theory concepts are investigated for mapping the features
space to decision space. A Gaussian Mixture Model (GMM) is used
as a baseline and reference system. Various strategies are proposed
for training and testing sessions with matched or mismatched
conditions, long training and long testing, long training and short
testing. For all experiments, the file sections used for testing are
never been used during training. With matched conditions all
examined measures yield the best and similar scores (almost 100%).
With mismatched conditions, the proposed measures yield better
scores than the GMM baseline system, especially for the short testing
case. It is also observed that the average discrimination information
measure is most appropriate for music category classifications and on
the other hand the divergence measure is more suitable for music
subcategory classifications.
Abstract: Image clustering is a process of grouping images
based on their similarity. The image clustering usually uses the color
component, texture, edge, shape, or mixture of two components, etc.
This research aims to explore image clustering using color
composition. In order to complete this image clustering, three main
components should be considered, which are color space, image
representation (feature extraction), and clustering method itself. We
aim to explore which composition of these factors will produce the
best clustering results by combining various techniques from the
three components. The color spaces use RGB, HSV, and L*a*b*
method. The image representations use Histogram and Gaussian
Mixture Model (GMM), whereas the clustering methods use KMeans
and Agglomerative Hierarchical Clustering algorithm. The
results of the experiment show that GMM representation is better
combined with RGB and L*a*b* color space, whereas Histogram is
better combined with HSV. The experiments also show that K-Means
is better than Agglomerative Hierarchical for images clustering.
Abstract: Real world Speaker Identification (SI) application
differs from ideal or laboratory conditions causing perturbations that
leads to a mismatch between the training and testing environment
and degrade the performance drastically. Many strategies have been
adopted to cope with acoustical degradation; wavelet based Bayesian
marginal model is one of them. But Bayesian marginal models
cannot model the inter-scale statistical dependencies of different
wavelet scales. Simple nonlinear estimators for wavelet based
denoising assume that the wavelet coefficients in different scales are
independent in nature. However wavelet coefficients have significant
inter-scale dependency. This paper enhances this inter-scale
dependency property by a Circularly Symmetric Probability Density
Function (CS-PDF) related to the family of Spherically Invariant
Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain
and corresponding joint shrinkage estimator is derived by Maximum
a Posteriori (MAP) estimator. A framework is proposed based on
these to denoise speech signal for automatic speaker identification
problems. The robustness of the proposed framework is tested for
Text Independent Speaker Identification application on 100 speakers
of POLYCOST and 100 speakers of YOHO speech database in three
different noise environments. Experimental results show that the
proposed estimator yields a higher improvement in identification
accuracy compared to other estimators on popular Gaussian Mixture
Model (GMM) based speaker model and Mel-Frequency Cepstral
Coefficient (MFCC) features.
Abstract: Mixed-traffic (e.g., pedestrians, bicycles, and vehicles)
data at an intersection is one of the essential factors for intersection
design and traffic control. However, some data such as pedestrian
volume cannot be directly collected by common detectors (e.g.
inductive loop, sonar and microwave sensors). In this paper, a video
based detection algorithm is proposed for mixed-traffic data collection
at intersections using surveillance cameras. The algorithm is derived
from Gaussian Mixture Model (GMM), and uses a mergence time
adjustment scheme to improve the traditional algorithm. Real-world
video data were selected to test the algorithm. The results show that
the proposed algorithm has the faster processing speed and more
accuracy than the traditional algorithm. This indicates that the
improved algorithm can be applied to detect mixed-traffic at
signalized intersection, even when conflicts occur.
Abstract: In this paper, an algorithm for detecting and attenuating
puff noises frequently generated under the mobile environment is
proposed. As a baseline system, puff detection system is designed
based on Gaussian Mixture Model (GMM), and 39th Mel Frequency
Cepstral Coefficient (MFCC) is extracted as feature parameters. To
improve the detection performance, effective acoustic features for puff
detection are proposed. In addition, detected puff intervals are
attenuated by high-pass filtering. The speech recognition rate was
measured for evaluation and confusion matrix and ROC curve are used
to confirm the validity of the proposed system.
Abstract: A state of the art Speaker Identification (SI) system
requires a robust feature extraction unit followed by a speaker
modeling scheme for generalized representation of these features.
Over the years, Mel-Frequency Cepstral Coefficients (MFCC)
modeled on the human auditory system has been used as a standard
acoustic feature set for speech related applications. On a recent
contribution by authors, it has been shown that the Inverted Mel-
Frequency Cepstral Coefficients (IMFCC) is useful feature set for
SI, which contains complementary information present in high
frequency region. This paper introduces the Gaussian shaped filter
(GF) while calculating MFCC and IMFCC in place of typical
triangular shaped bins. The objective is to introduce a higher
amount of correlation between subband outputs. The performances
of both MFCC & IMFCC improve with GF over conventional
triangular filter (TF) based implementation, individually as well as
in combination. With GMM as speaker modeling paradigm, the
performances of proposed GF based MFCC and IMFCC in
individual and fused mode have been verified in two standard
databases YOHO, (Microphone Speech) and POLYCOST
(Telephone Speech) each of which has more than 130 speakers.
Abstract: This paper proposes, for the first time, how the
challenges facing the guard-band designs including the margin
assist-circuits scheme for the screening-test in the coming process
generations should be addressed. The increased screening error
impacts are discussed based on the proposed statistical analysis
models. It has been shown that the yield-loss caused by the
misjudgment on the screening test would become 5-orders of
magnitude larger than that for the conventional one when the
amplitude of random telegraph noise (RTN) caused variations
approaches to that of random dopant fluctuation. Three fitting methods
to approximate the RTN caused complex Gamma mixtures
distributions by the simple Gaussian mixtures model (GMM) are
proposed and compared. It has been verified that the proposed
methods can reduce the error of the fail-bit predictions by 4-orders of
magnitude.
Abstract: The aim of this paper is to investigate the influence of
market share and diversification on the nonlife insurers- performance.
The underlying relationships have been investigated in different
industries and different disciplines (economics, management...), still,
no consistency exists either in the magnitude or statistical
significance of the relationship between market share (and
diversification as well) on one side and companies- performance on
the other side. Moreover, the direction of the relationship is also
somewhat questionable. While some authors find this relationship to
be positive, the others reveal its negative association. In order to test
the influence of market share and diversification on companies-
performance in Croatian nonlife insurance industry for the period
from 1999 to 2009, we designed an empirical model in which we
included the following independent variables: firms- profitability
from previous years, market share, diversification and control
variables (i.e. ownership, industrial concentration, GDP per capita,
inflation). Using the two-step generalized method of moments
(GMM) estimator we found evidence of a positive and statistically
significant influence of both, market share and diversification, on
insurers- profitability.
Abstract: Distant-talking voice-based HCI system suffers from
performance degradation due to mismatch between the acoustic
speech (runtime) and the acoustic model (training). Mismatch is
caused by the change in the power of the speech signal as observed at
the microphones. This change is greatly influenced by the change in
distance, affecting speech dynamics inside the room before reaching
the microphones. Moreover, as the speech signal is reflected, its
acoustical characteristic is also altered by the room properties. In
general, power mismatch due to distance is a complex problem. This
paper presents a novel approach in dealing with distance-induced
mismatch by intelligently sensing instantaneous voice power variation
and compensating model parameters. First, the distant-talking speech
signal is processed through microphone array processing, and the
corresponding distance information is extracted. Distance-sensitive
Gaussian Mixture Models (GMMs), pre-trained to capture both
speech power and room property are used to predict the optimal
distance of the speech source. Consequently, pre-computed statistic
priors corresponding to the optimal distance is selected to correct
the statistics of the generic model which was frozen during training.
Thus, model combinatorics are post-conditioned to match the power
of instantaneous speech acoustics at runtime. This results to an
improved likelihood in predicting the correct speech command at
farther distances. We experiment using real data recorded inside two
rooms. Experimental evaluation shows voice recognition performance
using our method is more robust to the change in distance compared
to the conventional approach. In our experiment, under the most
acoustically challenging environment (i.e., Room 2: 2.5 meters), our
method achieved 24.2% improvement in recognition performance
against the best-performing conventional method.
Abstract: The study examines the determinants of corporate cash holding of non-financial quoted firms in Nigeria using a sample of fifty four non-financial quoted firms listed on the Nigeria Stock Exchange for the period 1995-2009. Data were sourced from the Annual reports of the sampled firms and analyzed using Generalized Method of Moments(GMM). The study finds evidence supportive of a target adjustment model and that firms can not instantaneously adjust towards the target cash level owing to the fact that adjustment cost being costly,. Also, the result shows significant negative relationship between cash holdings and firm size, net working capital, return on asset and bank relationship and positive relationship with growth opportunities, leverage, inventories, account receivables and financial distress. Furthermore, there is no significant relationship between cash holdings and cash flow. In Nigerian setting, most of the variables that are relevant for explaining cash holdings in the Developed countries are found by this study to be relevant also in Nigeria.
Abstract: Mel Frequency Cepstral Coefficient (MFCC) features
are widely used as acoustic features for speech recognition as well
as speaker recognition. In MFCC feature representation, the Mel frequency
scale is used to get a high resolution in low frequency region,
and a low resolution in high frequency region. This kind of processing
is good for obtaining stable phonetic information, but not suitable
for speaker features that are located in high frequency regions. The
speaker individual information, which is non-uniformly distributed
in the high frequencies, is equally important for speaker recognition.
Based on this fact we proposed an admissible wavelet packet based
filter structure for speaker identification. Multiresolution capabilities
of wavelet packet transform are used to derive the new features.
The proposed scheme differs from previous wavelet based works,
mainly in designing the filter structure. Unlike others, the proposed
filter structure does not follow Mel scale. The closed-set speaker
identification experiments performed on the TIMIT database shows
improved identification performance compared to other commonly
used Mel scale based filter structures using wavelets.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.
Abstract: In this paper, a novel method for recognition of musical
instruments in a polyphonic music is presented by using an
embedded hidden Markov model (EHMM). EHMM is a doubly
embedded HMM structure where each state of the external HMM
is an independent HMM. The classification is accomplished for
two different internal HMM structures where GMMs are used as
likelihood estimators for the internal HMMs. The results are compared
to those achieved by an artificial neural network with two
hidden layers. Appropriate classification accuracies were achieved
both for solo instrument performance and instrument combinations
which demonstrates that the new approach outperforms the similar
classification methods by means of the dynamic of the signal.
Abstract: One major source of performance decline in speaker
recognition system is channel mismatch between training and testing.
This paper focuses on improving channel robustness of speaker
recognition system in two aspects of channel compensation technique
and channel robust features. The system is text-independent speaker
identification system based on two-stage recognition. In the aspect of
channel compensation technique, this paper applies MAP (Maximum
A Posterior Probability) channel compensation technique, which was
used in speech recognition, to speaker recognition system. In the
aspect of channel robust features, this paper introduces
pitch-dependent features and pitch-dependent speaker model for the
second stage recognition. Based on the first stage recognition to
testing speech using GMM (Gaussian Mixture Model), the system
uses GMM scores to decide if it needs to be recognized again. If it
needs to, the system selects a few speakers from all of the speakers
who participate in the first stage recognition for the second stage
recognition. For each selected speaker, the system obtains 3
pitch-dependent results from his pitch-dependent speaker model, and
then uses ANN (Artificial Neural Network) to unite the 3
pitch-dependent results and 1 GMM score for getting a fused result.
The system makes the second stage recognition based on these fused
results. The experiments show that the correct rate of two-stage
recognition system based on MAP channel compensation technique
and pitch-dependent features is 41.7% better than the baseline system
for closed-set test.
Abstract: Current image-based individual human recognition
methods, such as fingerprints, face, or iris biometric modalities
generally require a cooperative subject, views from certain aspects,
and physical contact or close proximity. These methods cannot
reliably recognize non-cooperating individuals at a distance in the
real world under changing environmental conditions. Gait, which
concerns recognizing individuals by the way they walk, is a relatively
new biometric without these disadvantages. The inherent gait
characteristic of an individual makes it irreplaceable and useful in
visual surveillance.
In this paper, an efficient gait recognition system for human
identification by extracting two features namely width vector of
the binary silhouette and the MPEG-7-based region-based shape
descriptors is proposed. In the proposed method, foreground objects
i.e., human and other moving objects are extracted by estimating
background information by a Gaussian Mixture Model (GMM) and
subsequently, median filtering operation is performed for removing
noises in the background subtracted image. A moving target classification
algorithm is used to separate human being (i.e., pedestrian)
from other foreground objects (viz., vehicles). Shape and boundary
information is used in the moving target classification algorithm.
Subsequently, width vector of the outer contour of binary silhouette
and the MPEG-7 Angular Radial Transform coefficients are taken as
the feature vector. Next, the Principal Component Analysis (PCA)
is applied to the selected feature vector to reduce its dimensionality.
These extracted feature vectors are used to train an Hidden Markov
Model (HMM) for identification of some individuals. The proposed
system is evaluated using some gait sequences and the experimental
results show the efficacy of the proposed algorithm.