A Hybrid GMM/SVM System for Text Independent Speaker Identification

This paper proposes a novel approach that combines statistical models and support vector machines. A hybrid scheme which appropriately incorporates the advantages of both the generative and discriminant model paradigms is described and evaluated. Support vector machines (SVMs) are trained to divide the whole speakers' space into small subsets of speakers within a hierarchical tree structure. During testing a speech token is assigned to its corresponding group and evaluation using gaussian mixture models (GMMs) is then processed. Experimental results show that the proposed method can significantly improve the performance of text independent speaker identification task. We report improvements of up to 50% reduction in identification error rate compared to the baseline statistical model.

Automatic Musical Genre Classification Using Divergence and Average Information Measures

Recently many research has been conducted to retrieve pertinent parameters and adequate models for automatic music genre classification. In this paper, two measures based upon information theory concepts are investigated for mapping the features space to decision space. A Gaussian Mixture Model (GMM) is used as a baseline and reference system. Various strategies are proposed for training and testing sessions with matched or mismatched conditions, long training and long testing, long training and short testing. For all experiments, the file sections used for testing are never been used during training. With matched conditions all examined measures yield the best and similar scores (almost 100%). With mismatched conditions, the proposed measures yield better scores than the GMM baseline system, especially for the short testing case. It is also observed that the average discrimination information measure is most appropriate for music category classifications and on the other hand the divergence measure is more suitable for music subcategory classifications.

Journey on Image Clustering Based on Color Composition

Image clustering is a process of grouping images based on their similarity. The image clustering usually uses the color component, texture, edge, shape, or mixture of two components, etc. This research aims to explore image clustering using color composition. In order to complete this image clustering, three main components should be considered, which are color space, image representation (feature extraction), and clustering method itself. We aim to explore which composition of these factors will produce the best clustering results by combining various techniques from the three components. The color spaces use RGB, HSV, and L*a*b* method. The image representations use Histogram and Gaussian Mixture Model (GMM), whereas the clustering methods use KMeans and Agglomerative Hierarchical Clustering algorithm. The results of the experiment show that GMM representation is better combined with RGB and L*a*b* color space, whereas Histogram is better combined with HSV. The experiments also show that K-Means is better than Agglomerative Hierarchical for images clustering.

Speaker Identification by Joint Statistical Characterization in the Log Gabor Wavelet Domain

Real world Speaker Identification (SI) application differs from ideal or laboratory conditions causing perturbations that leads to a mismatch between the training and testing environment and degrade the performance drastically. Many strategies have been adopted to cope with acoustical degradation; wavelet based Bayesian marginal model is one of them. But Bayesian marginal models cannot model the inter-scale statistical dependencies of different wavelet scales. Simple nonlinear estimators for wavelet based denoising assume that the wavelet coefficients in different scales are independent in nature. However wavelet coefficients have significant inter-scale dependency. This paper enhances this inter-scale dependency property by a Circularly Symmetric Probability Density Function (CS-PDF) related to the family of Spherically Invariant Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain and corresponding joint shrinkage estimator is derived by Maximum a Posteriori (MAP) estimator. A framework is proposed based on these to denoise speech signal for automatic speaker identification problems. The robustness of the proposed framework is tested for Text Independent Speaker Identification application on 100 speakers of POLYCOST and 100 speakers of YOHO speech database in three different noise environments. Experimental results show that the proposed estimator yields a higher improvement in identification accuracy compared to other estimators on popular Gaussian Mixture Model (GMM) based speaker model and Mel-Frequency Cepstral Coefficient (MFCC) features.

A Video-based Algorithm for Moving Objects Detection at Signalized Intersection

Mixed-traffic (e.g., pedestrians, bicycles, and vehicles) data at an intersection is one of the essential factors for intersection design and traffic control. However, some data such as pedestrian volume cannot be directly collected by common detectors (e.g. inductive loop, sonar and microwave sensors). In this paper, a video based detection algorithm is proposed for mixed-traffic data collection at intersections using surveillance cameras. The algorithm is derived from Gaussian Mixture Model (GMM), and uses a mergence time adjustment scheme to improve the traditional algorithm. Real-world video data were selected to test the algorithm. The results show that the proposed algorithm has the faster processing speed and more accuracy than the traditional algorithm. This indicates that the improved algorithm can be applied to detect mixed-traffic at signalized intersection, even when conflicts occur.

Puff Noise Detection and Cancellation for Robust Speech Recognition

In this paper, an algorithm for detecting and attenuating puff noises frequently generated under the mobile environment is proposed. As a baseline system, puff detection system is designed based on Gaussian Mixture Model (GMM), and 39th Mel Frequency Cepstral Coefficient (MFCC) is extracted as feature parameters. To improve the detection performance, effective acoustic features for puff detection are proposed. In addition, detected puff intervals are attenuated by high-pass filtering. The speech recognition rate was measured for evaluation and confusion matrix and ROC curve are used to confirm the validity of the proposed system.

Improved Text-Independent Speaker Identification using Fused MFCC and IMFCC Feature Sets based on Gaussian Filter

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for speech related applications. On a recent contribution by authors, it has been shown that the Inverted Mel- Frequency Cepstral Coefficients (IMFCC) is useful feature set for SI, which contains complementary information present in high frequency region. This paper introduces the Gaussian shaped filter (GF) while calculating MFCC and IMFCC in place of typical triangular shaped bins. The objective is to introduce a higher amount of correlation between subband outputs. The performances of both MFCC & IMFCC improve with GF over conventional triangular filter (TF) based implementation, individually as well as in combination. With GMM as speaker modeling paradigm, the performances of proposed GF based MFCC and IMFCC in individual and fused mode have been verified in two standard databases YOHO, (Microphone Speech) and POLYCOST (Telephone Speech) each of which has more than 130 speakers.

A Nano-Scaled SRAM Guard Band Design with Gaussian Mixtures Model of Complex Long Tail RTN Distributions

This paper proposes, for the first time, how the challenges facing the guard-band designs including the margin assist-circuits scheme for the screening-test in the coming process generations should be addressed. The increased screening error impacts are discussed based on the proposed statistical analysis models. It has been shown that the yield-loss caused by the misjudgment on the screening test would become 5-orders of magnitude larger than that for the conventional one when the amplitude of random telegraph noise (RTN) caused variations approaches to that of random dopant fluctuation. Three fitting methods to approximate the RTN caused complex Gamma mixtures distributions by the simple Gaussian mixtures model (GMM) are proposed and compared. It has been verified that the proposed methods can reduce the error of the fail-bit predictions by 4-orders of magnitude.

Effects of Market Share and Diversification on Nonlife Insurers- Performance

The aim of this paper is to investigate the influence of market share and diversification on the nonlife insurers- performance. The underlying relationships have been investigated in different industries and different disciplines (economics, management...), still, no consistency exists either in the magnitude or statistical significance of the relationship between market share (and diversification as well) on one side and companies- performance on the other side. Moreover, the direction of the relationship is also somewhat questionable. While some authors find this relationship to be positive, the others reveal its negative association. In order to test the influence of market share and diversification on companies- performance in Croatian nonlife insurance industry for the period from 1999 to 2009, we designed an empirical model in which we included the following independent variables: firms- profitability from previous years, market share, diversification and control variables (i.e. ownership, industrial concentration, GDP per capita, inflation). Using the two-step generalized method of moments (GMM) estimator we found evidence of a positive and statistically significant influence of both, market share and diversification, on insurers- profitability.

Automatic Distance Compensation for Robust Voice-based Human-Computer Interaction

Distant-talking voice-based HCI system suffers from performance degradation due to mismatch between the acoustic speech (runtime) and the acoustic model (training). Mismatch is caused by the change in the power of the speech signal as observed at the microphones. This change is greatly influenced by the change in distance, affecting speech dynamics inside the room before reaching the microphones. Moreover, as the speech signal is reflected, its acoustical characteristic is also altered by the room properties. In general, power mismatch due to distance is a complex problem. This paper presents a novel approach in dealing with distance-induced mismatch by intelligently sensing instantaneous voice power variation and compensating model parameters. First, the distant-talking speech signal is processed through microphone array processing, and the corresponding distance information is extracted. Distance-sensitive Gaussian Mixture Models (GMMs), pre-trained to capture both speech power and room property are used to predict the optimal distance of the speech source. Consequently, pre-computed statistic priors corresponding to the optimal distance is selected to correct the statistics of the generic model which was frozen during training. Thus, model combinatorics are post-conditioned to match the power of instantaneous speech acoustics at runtime. This results to an improved likelihood in predicting the correct speech command at farther distances. We experiment using real data recorded inside two rooms. Experimental evaluation shows voice recognition performance using our method is more robust to the change in distance compared to the conventional approach. In our experiment, under the most acoustically challenging environment (i.e., Room 2: 2.5 meters), our method achieved 24.2% improvement in recognition performance against the best-performing conventional method.

The Determinants of Corporate Cash Holdings in Nigeria: Evidence from General Method of Moments (GMM)

The study examines the determinants of corporate cash holding of non-financial quoted firms in Nigeria using a sample of fifty four non-financial quoted firms listed on the Nigeria Stock Exchange for the period 1995-2009. Data were sourced from the Annual reports of the sampled firms and analyzed using Generalized Method of Moments(GMM). The study finds evidence supportive of a target adjustment model and that firms can not instantaneously adjust towards the target cash level owing to the fact that adjustment cost being costly,. Also, the result shows significant negative relationship between cash holdings and firm size, net working capital, return on asset and bank relationship and positive relationship with growth opportunities, leverage, inventories, account receivables and financial distress. Furthermore, there is no significant relationship between cash holdings and cash flow. In Nigerian setting, most of the variables that are relevant for explaining cash holdings in the Developed countries are found by this study to be relevant also in Nigeria.

Speaker Identification Using Admissible Wavelet Packet Based Decomposition

Mel Frequency Cepstral Coefficient (MFCC) features are widely used as acoustic features for speech recognition as well as speaker recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolution in low frequency region, and a low resolution in high frequency region. This kind of processing is good for obtaining stable phonetic information, but not suitable for speaker features that are located in high frequency regions. The speaker individual information, which is non-uniformly distributed in the high frequencies, is equally important for speaker recognition. Based on this fact we proposed an admissible wavelet packet based filter structure for speaker identification. Multiresolution capabilities of wavelet packet transform are used to derive the new features. The proposed scheme differs from previous wavelet based works, mainly in designing the filter structure. Unlike others, the proposed filter structure does not follow Mel scale. The closed-set speaker identification experiments performed on the TIMIT database shows improved identification performance compared to other commonly used Mel scale based filter structures using wavelets.

Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.

Musical Instrument Classification Using Embedded Hidden Markov Models

In this paper, a novel method for recognition of musical instruments in a polyphonic music is presented by using an embedded hidden Markov model (EHMM). EHMM is a doubly embedded HMM structure where each state of the external HMM is an independent HMM. The classification is accomplished for two different internal HMM structures where GMMs are used as likelihood estimators for the internal HMMs. The results are compared to those achieved by an artificial neural network with two hidden layers. Appropriate classification accuracies were achieved both for solo instrument performance and instrument combinations which demonstrates that the new approach outperforms the similar classification methods by means of the dynamic of the signal.

Text-independent Speaker Identification Based on MAP Channel Compensation and Pitch-dependent Features

One major source of performance decline in speaker recognition system is channel mismatch between training and testing. This paper focuses on improving channel robustness of speaker recognition system in two aspects of channel compensation technique and channel robust features. The system is text-independent speaker identification system based on two-stage recognition. In the aspect of channel compensation technique, this paper applies MAP (Maximum A Posterior Probability) channel compensation technique, which was used in speech recognition, to speaker recognition system. In the aspect of channel robust features, this paper introduces pitch-dependent features and pitch-dependent speaker model for the second stage recognition. Based on the first stage recognition to testing speech using GMM (Gaussian Mixture Model), the system uses GMM scores to decide if it needs to be recognized again. If it needs to, the system selects a few speakers from all of the speakers who participate in the first stage recognition for the second stage recognition. For each selected speaker, the system obtains 3 pitch-dependent results from his pitch-dependent speaker model, and then uses ANN (Artificial Neural Network) to unite the 3 pitch-dependent results and 1 GMM score for getting a fused result. The system makes the second stage recognition based on these fused results. The experiments show that the correct rate of two-stage recognition system based on MAP channel compensation technique and pitch-dependent features is 41.7% better than the baseline system for closed-set test.

Person Identification using Gait by Combined Features of Width and Shape of the Binary Silhouette

Current image-based individual human recognition methods, such as fingerprints, face, or iris biometric modalities generally require a cooperative subject, views from certain aspects, and physical contact or close proximity. These methods cannot reliably recognize non-cooperating individuals at a distance in the real world under changing environmental conditions. Gait, which concerns recognizing individuals by the way they walk, is a relatively new biometric without these disadvantages. The inherent gait characteristic of an individual makes it irreplaceable and useful in visual surveillance. In this paper, an efficient gait recognition system for human identification by extracting two features namely width vector of the binary silhouette and the MPEG-7-based region-based shape descriptors is proposed. In the proposed method, foreground objects i.e., human and other moving objects are extracted by estimating background information by a Gaussian Mixture Model (GMM) and subsequently, median filtering operation is performed for removing noises in the background subtracted image. A moving target classification algorithm is used to separate human being (i.e., pedestrian) from other foreground objects (viz., vehicles). Shape and boundary information is used in the moving target classification algorithm. Subsequently, width vector of the outer contour of binary silhouette and the MPEG-7 Angular Radial Transform coefficients are taken as the feature vector. Next, the Principal Component Analysis (PCA) is applied to the selected feature vector to reduce its dimensionality. These extracted feature vectors are used to train an Hidden Markov Model (HMM) for identification of some individuals. The proposed system is evaluated using some gait sequences and the experimental results show the efficacy of the proposed algorithm.