Abstract: A state of the art Speaker Identification (SI) system
requires a robust feature extraction unit followed by a speaker
modeling scheme for generalized representation of these features.
Over the years, Mel-Frequency Cepstral Coefficients (MFCC)
modeled on the human auditory system has been used as a standard
acoustic feature set for speech related applications. On a recent
contribution by authors, it has been shown that the Inverted Mel-
Frequency Cepstral Coefficients (IMFCC) is useful feature set for
SI, which contains complementary information present in high
frequency region. This paper introduces the Gaussian shaped filter
(GF) while calculating MFCC and IMFCC in place of typical
triangular shaped bins. The objective is to introduce a higher
amount of correlation between subband outputs. The performances
of both MFCC & IMFCC improve with GF over conventional
triangular filter (TF) based implementation, individually as well as
in combination. With GMM as speaker modeling paradigm, the
performances of proposed GF based MFCC and IMFCC in
individual and fused mode have been verified in two standard
databases YOHO, (Microphone Speech) and POLYCOST
(Telephone Speech) each of which has more than 130 speakers.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.