Abstract: Matching algorithms have significant importance in
speaker recognition. Feature vectors of the unknown utterance are
compared to feature vectors of the modeled speakers as a last step in
speaker recognition. A similarity score is found for every model in
the speaker database. Depending on the type of speaker recognition,
these scores are used to determine the author of unknown speech
samples. For speaker verification, similarity score is tested against a
predefined threshold and either acceptance or rejection result is
obtained. In the case of speaker identification, the result depends on
whether the identification is open set or closed set. In closed set
identification, the model that yields the best similarity score is
accepted. In open set identification, the best score is tested against a
threshold, so there is one more possible output satisfying the
condition that the speaker is not one of the registered speakers in
existing database. This paper focuses on closed set speaker
identification using a modified version of a well known matching
algorithm. The results of new matching algorithm indicated better
performance on YOHO international speaker recognition database.
Abstract: Mel Frequency Cepstral Coefficient (MFCC) features
are widely used as acoustic features for speech recognition as well
as speaker recognition. In MFCC feature representation, the Mel frequency
scale is used to get a high resolution in low frequency region,
and a low resolution in high frequency region. This kind of processing
is good for obtaining stable phonetic information, but not suitable
for speaker features that are located in high frequency regions. The
speaker individual information, which is non-uniformly distributed
in the high frequencies, is equally important for speaker recognition.
Based on this fact we proposed an admissible wavelet packet based
filter structure for speaker identification. Multiresolution capabilities
of wavelet packet transform are used to derive the new features.
The proposed scheme differs from previous wavelet based works,
mainly in designing the filter structure. Unlike others, the proposed
filter structure does not follow Mel scale. The closed-set speaker
identification experiments performed on the TIMIT database shows
improved identification performance compared to other commonly
used Mel scale based filter structures using wavelets.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.