Abstract: In this paper, we present a wavelet coefficients masking
based on Local Binary Patterns (WLBP) approach to enhance the
temporal spectra of the wavelet coefficients for speech enhancement.
This technique exploits the wavelet denoising scheme, which splits
the degraded speech into pyramidal subband components and extracts
frequency information without losing temporal information. Speech
enhancement in each high-frequency subband is performed by binary
labels through the local binary pattern masking that encodes the ratio
between the original value of each coefficient and the values of the
neighbour coefficients. This approach enhances the high-frequency
spectra of the wavelet transform instead of eliminating them through
a threshold. A comparative analysis is carried out with conventional
speech enhancement algorithms, demonstrating that the proposed
technique achieves significant improvements in terms of PESQ, an
international recommendation of objective measure for estimating
subjective speech quality. Informal listening tests also show that
the proposed method in an acoustic context improves the quality
of speech, avoiding the annoying musical noise present in other
speech enhancement techniques. Experimental results obtained with a
DNN based speech recognizer in noisy environments corroborate the
superiority of the proposed scheme in the robust speech recognition
scenario.
Abstract: In this paper, we present the comparative subjective analysis of Improved Minima Controlled Recursive Averaging (IMCRA) Algorithm, the Kalman filter and the cascading of IMCRA and Kalman filter algorithms. Performance of speech enhancement algorithms can be predicted in two different ways. One is the objective method of evaluation in which the speech quality parameters are predicted computationally. The second is a subjective listening test in which the processed speech signal is subjected to the listeners who judge the quality of speech on certain parameters. The comparative objective evaluation of these algorithms was analyzed in terms of Global SNR, Segmental SNR and Perceptual Evaluation of Speech Quality (PESQ) by the authors and it was reported that with cascaded algorithms there is a substantial increase in objective parameters. Since subjective evaluation is the real test to judge the quality of speech enhancement algorithms, the authenticity of superiority of cascaded algorithms over individual IMCRA and Kalman algorithms is tested through subjective analysis in this paper. The results of subjective listening tests have confirmed that the cascaded algorithms perform better under all types of noise conditions.
Abstract: In this paper, Least Mean Square (LMS) adaptive
noise reduction algorithm is proposed to enhance the speech signal
from the noisy speech. In this, the speech signal is enhanced by
varying the step size as the function of the input signal. Objective and
subjective measures are made under various noises for the proposed
and existing algorithms. From the experimental results, it is seen that
the proposed LMS adaptive noise reduction algorithm reduces Mean
square Error (MSE) and Log Spectral Distance (LSD) as compared to
that of the earlier methods under various noise conditions with
different input SNR levels. In addition, the proposed algorithm
increases the Peak Signal to Noise Ratio (PSNR) and Segmental SNR
improvement (ΔSNRseg) values; improves the Mean Opinion Score
(MOS) as compared to that of the various existing LMS adaptive
noise reduction algorithms. From these experimental results, it is
observed that the proposed LMS adaptive noise reduction algorithm
reduces the speech distortion and residual noise as compared to that
of the existing methods.
Abstract: In this paper is to evaluate audio and speech quality
with the help of Digital Audio Watermarking Technique under the
different types of attacks (signal impairments) like Gaussian Noise,
Compression Error and Jittering Effect. Further attacks are
considered as Hostile Environment. Audio and Speech Quality
Evaluation is an important research topic. The traditional way for
speech quality evaluation is using subjective tests. They are reliable,
but very expensive, time consuming, and cannot be used in certain
applications such as online monitoring. Objective models, based on
human perception, were developed to predict the results of subjective
tests. The existing objective methods require either the original
speech or complicated computation model, which makes some
applications of quality evaluation impossible.
Abstract: A new analysis of perceptual speech enhancement is
presented. It focuses on the fact that if only noise above the masking
threshold is filtered, then noise below the masking threshold, but
above the absolute threshold of hearing, can become audible after the
masker filtering. This particular drawback of some perceptual filters,
hereafter called the maskee-to-audible-noise (MAN) phenomenon,
favours the emergence of isolated tonals that increase musical noise.
Two filtering techniques that avoid or correct the MAN phenomenon
are proposed to effectively suppress background noise without introducing
much distortion. Experimental results, including objective
and subjective measurements, show that these techniques improve
the enhanced speech quality and the gain they bring emphasizes the
importance of the MAN phenomenon.
Abstract: The voice signal in Voice over Internet protocol (VoIP) system is processed through the best effort policy based IP network, which leads to the network degradations including delay, packet loss jitter. The work in this paper presents the implementation of finite impulse response (FIR) filter for voice quality improvement in the VoIP system through distributed arithmetic (DA) algorithm. The VoIP simulations are conducted with AMR-NB 6.70 kbps and G.729a speech coders at different packet loss rates and the performance of the enhanced VoIP signal is evaluated using the perceptual evaluation of speech quality (PESQ) measurement for narrowband signal. The results show reduction in the computational complexity in the system and significant improvement in the quality of the VoIP voice signal.
Abstract: The transformation of vocal characteristics aims at
modifying voice such that the intelligibility of aphonic voice is
increased or the voice characteristics of a speaker (source speaker) to
be perceived as if another speaker (target speaker) had uttered it. In
this paper, the current state-of-the-art voice characteristics
transformation methodology is reviewed. Special emphasis is placed
on voice transformation methodology and issues for improving the
transformed speech quality in intelligibility and naturalness are
discussed. In particular, it is suggested to use the modulation theory
of speech as a base for research on high quality voice transformation.
This approach allows one to separate linguistic, expressive, organic
and perspective information of speech, based on an analysis of how
they are fused when speech is produced. Therefore, this theory
provides the fundamentals not only for manipulating non-linguistic,
extra-/paralinguistic and intra-linguistic variables for voice
transformation, but also for paving the way for easily transposing the
existing voice transformation methods to emotion-related voice
quality transformation and speaking style transformation. From the
perspectives of human speech production and perception, the popular
voice transformation techniques are described and classified them
based on the underlying principles either from the speech production
or perception mechanisms or from both. In addition, the advantages
and limitations of voice transformation techniques and the
experimental manipulation of vocal cues are discussed through
examples from past and present research. Finally, a conclusion and
road map are pointed out for more natural voice transformation
algorithms in the future.