Abstract: Rapid progress in audio compression technology has contributed to the explosive growth of music available in digital form today. In a reversal of ideas, this work makes use of a recently proposed efficient audio compression scheme to develop three important applications in the context of Music Information Retrieval (MIR) for the effective manipulation of large music databases, namely automatic music recommendation (AMR), digital rights management (DRM) and audio finger-printing for song identification. The performance of these three applications has been evaluated with respect to a database of songs collected from a diverse set of genres.
Abstract: Real world Speaker Identification (SI) application
differs from ideal or laboratory conditions causing perturbations that
leads to a mismatch between the training and testing environment
and degrade the performance drastically. Many strategies have been
adopted to cope with acoustical degradation; wavelet based Bayesian
marginal model is one of them. But Bayesian marginal models
cannot model the inter-scale statistical dependencies of different
wavelet scales. Simple nonlinear estimators for wavelet based
denoising assume that the wavelet coefficients in different scales are
independent in nature. However wavelet coefficients have significant
inter-scale dependency. This paper enhances this inter-scale
dependency property by a Circularly Symmetric Probability Density
Function (CS-PDF) related to the family of Spherically Invariant
Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain
and corresponding joint shrinkage estimator is derived by Maximum
a Posteriori (MAP) estimator. A framework is proposed based on
these to denoise speech signal for automatic speaker identification
problems. The robustness of the proposed framework is tested for
Text Independent Speaker Identification application on 100 speakers
of POLYCOST and 100 speakers of YOHO speech database in three
different noise environments. Experimental results show that the
proposed estimator yields a higher improvement in identification
accuracy compared to other estimators on popular Gaussian Mixture
Model (GMM) based speaker model and Mel-Frequency Cepstral
Coefficient (MFCC) features.
Abstract: A state of the art Speaker Identification (SI) system
requires a robust feature extraction unit followed by a speaker
modeling scheme for generalized representation of these features.
Over the years, Mel-Frequency Cepstral Coefficients (MFCC)
modeled on the human auditory system has been used as a standard
acoustic feature set for speech related applications. On a recent
contribution by authors, it has been shown that the Inverted Mel-
Frequency Cepstral Coefficients (IMFCC) is useful feature set for
SI, which contains complementary information present in high
frequency region. This paper introduces the Gaussian shaped filter
(GF) while calculating MFCC and IMFCC in place of typical
triangular shaped bins. The objective is to introduce a higher
amount of correlation between subband outputs. The performances
of both MFCC & IMFCC improve with GF over conventional
triangular filter (TF) based implementation, individually as well as
in combination. With GMM as speaker modeling paradigm, the
performances of proposed GF based MFCC and IMFCC in
individual and fused mode have been verified in two standard
databases YOHO, (Microphone Speech) and POLYCOST
(Telephone Speech) each of which has more than 130 speakers.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.
Abstract: This work presents a fusion of Log Gabor Wavelet
(LGW) and Maximum a Posteriori (MAP) estimator as a speech
enhancement tool for acoustical background noise reduction. The
probability density function (pdf) of the speech spectral amplitude is
approximated by a Generalized Laplacian Distribution (GLD).
Compared to earlier estimators the proposed method estimates the
underlying statistical model more accurately by appropriately
choosing the model parameters of GLD. Experimental results show
that the proposed estimator yields a higher improvement in
Segmental Signal-to-Noise Ratio (S-SNR) and lower Log-Spectral
Distortion (LSD) in two different noisy environments compared to
other estimators.
Abstract: Artificial Neural Network (ANN) has been
extensively used for classification of heart sounds for its
discriminative training ability and easy implementation. However, it
suffers from overparameterization if the number of nodes is not
chosen properly. In such cases, when the dataset has redundancy
within it, ANN is trained along with this redundant information that
results in poor validation. Also a larger network means more
computational expense resulting more hardware and time related
cost. Therefore, an optimum design of neural network is needed
towards real-time detection of pathological patterns, if any from heart
sound signal. The aims of this work are to (i) select a set of input
features that are effective for identification of heart sound signals and
(ii) make certain optimum selection of nodes in the hidden layer for a
more effective ANN structure. Here, we present an optimization
technique that involves Singular Value Decomposition (SVD) and
QR factorization with column pivoting (QRcp) methodology to
optimize empirically chosen over-parameterized ANN structure.
Input nodes present in ANN structure is optimized by SVD followed
by QRcp while only SVD is required to prune undesirable hidden
nodes. The result is presented for classifying 12 common
pathological cases and normal heart sound.