Abstract: In this paper, we present a wavelet coefficients masking
based on Local Binary Patterns (WLBP) approach to enhance the
temporal spectra of the wavelet coefficients for speech enhancement.
This technique exploits the wavelet denoising scheme, which splits
the degraded speech into pyramidal subband components and extracts
frequency information without losing temporal information. Speech
enhancement in each high-frequency subband is performed by binary
labels through the local binary pattern masking that encodes the ratio
between the original value of each coefficient and the values of the
neighbour coefficients. This approach enhances the high-frequency
spectra of the wavelet transform instead of eliminating them through
a threshold. A comparative analysis is carried out with conventional
speech enhancement algorithms, demonstrating that the proposed
technique achieves significant improvements in terms of PESQ, an
international recommendation of objective measure for estimating
subjective speech quality. Informal listening tests also show that
the proposed method in an acoustic context improves the quality
of speech, avoiding the annoying musical noise present in other
speech enhancement techniques. Experimental results obtained with a
DNN based speech recognizer in noisy environments corroborate the
superiority of the proposed scheme in the robust speech recognition
scenario.
Abstract: In this paper, we present the comparative subjective analysis of Improved Minima Controlled Recursive Averaging (IMCRA) Algorithm, the Kalman filter and the cascading of IMCRA and Kalman filter algorithms. Performance of speech enhancement algorithms can be predicted in two different ways. One is the objective method of evaluation in which the speech quality parameters are predicted computationally. The second is a subjective listening test in which the processed speech signal is subjected to the listeners who judge the quality of speech on certain parameters. The comparative objective evaluation of these algorithms was analyzed in terms of Global SNR, Segmental SNR and Perceptual Evaluation of Speech Quality (PESQ) by the authors and it was reported that with cascaded algorithms there is a substantial increase in objective parameters. Since subjective evaluation is the real test to judge the quality of speech enhancement algorithms, the authenticity of superiority of cascaded algorithms over individual IMCRA and Kalman algorithms is tested through subjective analysis in this paper. The results of subjective listening tests have confirmed that the cascaded algorithms perform better under all types of noise conditions.
Abstract: Numerous signal processing based speech enhancement systems have been proposed to improve intelligibility in the presence of noise. Traditionally, studies of neural vowel encoding have focused on the representation of formants (peaks in vowel spectra) in the discharge patterns of the population of auditory-nerve (AN) fibers. A method is presented for recording high-frequency speech components into a low-frequency region, to increase audibility for hearing loss listeners. The purpose of the paper is to enhance the formant of the speech based on the Kaiser window. The pitch and formant of the signal is based on the auto correlation, zero crossing and magnitude difference function. The formant enhancement stage aims to restore the representation of formants at the level of the midbrain. A MATLAB software’s are used for the implementation of the system with low complexity is developed.
Abstract: Speech enhancement is a long standing problem with
numerous applications like teleconferencing, VoIP, hearing aids and
speech recognition. The motivation behind this research work is to
obtain a clean speech signal of higher quality by applying the optimal
noise cancellation technique. Real-time adaptive filtering algorithms
seem to be the best candidate among all categories of the speech
enhancement methods. In this paper, we propose a speech
enhancement method based on Recursive Least Squares (RLS)
adaptive filter of speech signals. Experiments were performed on
noisy data which was prepared by adding AWGN, Babble and Pink
noise to clean speech samples at -5dB, 0dB, 5dB and 10dB SNR
levels. We then compare the noise cancellation performance of
proposed RLS algorithm with existing NLMS algorithm in terms of
Mean Squared Error (MSE), Signal to Noise ratio (SNR) and SNR
Loss. Based on the performance evaluation, the proposed RLS
algorithm was found to be a better optimal noise cancellation
technique for speech signals.
Abstract: In this paper, Least Mean Square (LMS) adaptive
noise reduction algorithm is proposed to enhance the speech signal
from the noisy speech. In this, the speech signal is enhanced by
varying the step size as the function of the input signal. Objective and
subjective measures are made under various noises for the proposed
and existing algorithms. From the experimental results, it is seen that
the proposed LMS adaptive noise reduction algorithm reduces Mean
square Error (MSE) and Log Spectral Distance (LSD) as compared to
that of the earlier methods under various noise conditions with
different input SNR levels. In addition, the proposed algorithm
increases the Peak Signal to Noise Ratio (PSNR) and Segmental SNR
improvement (ΔSNRseg) values; improves the Mean Opinion Score
(MOS) as compared to that of the various existing LMS adaptive
noise reduction algorithms. From these experimental results, it is
observed that the proposed LMS adaptive noise reduction algorithm
reduces the speech distortion and residual noise as compared to that
of the existing methods.
Abstract: Speech enhancement is the process of eliminating
noise and increasing the quality of a speech signal, which is
contaminated with other kinds of distortions. This paper is on
developing an optimum cascaded system for speech enhancement.
This aim is attained without diminishing any relevant speech
information and without much computational and time complexity.
LMS algorithm, Spectral Subtraction and Kalman filter have been
deployed as the main de-noising algorithms in this work. Since these
algorithms suffer from respective shortcomings, this work has been
undertaken to design cascaded systems in different combinations and
the evaluation of such cascades by qualitative (listening) and
quantitative (SNR) tests.
Abstract: This paper presents a new method for estimating the nonstationary
noise power spectral density given a noisy signal. The
method is based on averaging the noisy speech power spectrum using
time and frequency dependent smoothing factors. These factors are
adjusted based on signal-presence probability in individual frequency
bins. Signal presence is determined by computing the ratio of the
noisy speech power spectrum to its local minimum, which is updated
continuously by averaging past values of the noisy speech power
spectra with a look-ahead factor. This method adapts very quickly to
highly non-stationary noise environments. The proposed method
achieves significant improvements over a system that uses voice
activity detector (VAD) in noise estimation.
Abstract: A new analysis of perceptual speech enhancement is
presented. It focuses on the fact that if only noise above the masking
threshold is filtered, then noise below the masking threshold, but
above the absolute threshold of hearing, can become audible after the
masker filtering. This particular drawback of some perceptual filters,
hereafter called the maskee-to-audible-noise (MAN) phenomenon,
favours the emergence of isolated tonals that increase musical noise.
Two filtering techniques that avoid or correct the MAN phenomenon
are proposed to effectively suppress background noise without introducing
much distortion. Experimental results, including objective
and subjective measurements, show that these techniques improve
the enhanced speech quality and the gain they bring emphasizes the
importance of the MAN phenomenon.
Abstract: In this paper, an algorithm for detecting and attenuating
puff noises frequently generated under the mobile environment is
proposed. As a baseline system, puff detection system is designed
based on Gaussian Mixture Model (GMM), and 39th Mel Frequency
Cepstral Coefficient (MFCC) is extracted as feature parameters. To
improve the detection performance, effective acoustic features for puff
detection are proposed. In addition, detected puff intervals are
attenuated by high-pass filtering. The speech recognition rate was
measured for evaluation and confusion matrix and ROC curve are used
to confirm the validity of the proposed system.
Abstract: Revolutions Applications such as telecommunications, hands-free communications, recording, etc. which need at least one microphone, the signal is usually infected by noise and echo. The important application is the speech enhancement, which is done to remove suppressed noises and echoes taken by a microphone, beside preferred speech. Accordingly, the microphone signal has to be cleaned using digital signal processing DSP tools before it is played out, transmitted, or stored. Engineers have so far tried different approaches to improving the speech by get back the desired speech signal from the noisy observations. Especially Mobile communication, so in this paper will do reconstruction of the speech signal, observed in additive background noise, using the Kalman filter technique to estimate the parameters of the Autoregressive Process (AR) in the state space model and the output speech signal obtained by the MATLAB. The accurate estimation by Kalman filter on speech would enhance and reduce the noise then compare and discuss the results between actual values and estimated values which produce the reconstructed signals.
Abstract: In this paper we present an enhanced noise reduction method for robust speech recognition using Adaptive Gain Equalizer with Non linear Spectral Subtraction. In Adaptive Gain Equalizer method (AGE), the input signal is divided into a number of subbands that are individually weighed in time domain, in accordance to the short time Signal-to-Noise Ratio (SNR) in each subband estimation at every time instant. Instead of focusing on suppression the noise on speech enhancement is focused. When analysis was done under various noise conditions for speech recognition, it was found that Adaptive Gain Equalizer method algorithm has an obvious failing point for a SNR of -5 dB, with inadequate levels of noise suppression for SNR less than this point. This work proposes the implementation of AGE when coupled with Non linear Spectral Subtraction (AGE-NSS) for robust speech recognition. The experimental result shows that out AGE-NSS performs the AGE when SNR drops below -5db level.
Abstract: Distant-talking voice-based HCI system suffers from
performance degradation due to mismatch between the acoustic
speech (runtime) and the acoustic model (training). Mismatch is
caused by the change in the power of the speech signal as observed at
the microphones. This change is greatly influenced by the change in
distance, affecting speech dynamics inside the room before reaching
the microphones. Moreover, as the speech signal is reflected, its
acoustical characteristic is also altered by the room properties. In
general, power mismatch due to distance is a complex problem. This
paper presents a novel approach in dealing with distance-induced
mismatch by intelligently sensing instantaneous voice power variation
and compensating model parameters. First, the distant-talking speech
signal is processed through microphone array processing, and the
corresponding distance information is extracted. Distance-sensitive
Gaussian Mixture Models (GMMs), pre-trained to capture both
speech power and room property are used to predict the optimal
distance of the speech source. Consequently, pre-computed statistic
priors corresponding to the optimal distance is selected to correct
the statistics of the generic model which was frozen during training.
Thus, model combinatorics are post-conditioned to match the power
of instantaneous speech acoustics at runtime. This results to an
improved likelihood in predicting the correct speech command at
farther distances. We experiment using real data recorded inside two
rooms. Experimental evaluation shows voice recognition performance
using our method is more robust to the change in distance compared
to the conventional approach. In our experiment, under the most
acoustically challenging environment (i.e., Room 2: 2.5 meters), our
method achieved 24.2% improvement in recognition performance
against the best-performing conventional method.
Abstract: In this work, we are interested in developing a speech denoising tool by using a discrete wavelet packet transform (DWPT). This speech denoising tool will be employed for applications of recognition, coding and synthesis. For noise reduction, instead of applying the classical thresholding technique, some wavelet packet nodes are set to zero and the others are thresholded. To estimate the non stationary noise level, we employ the spectral entropy. A comparison of our proposed technique to classical denoising methods based on thresholding and spectral subtraction is made in order to evaluate our approach. The experimental implementation uses speech signals corrupted by two sorts of noise, white and Volvo noises. The obtained results from listening tests show that our proposed technique is better than spectral subtraction. The obtained results from SNR computation show the superiority of our technique when compared to the classical thresholding method using the modified hard thresholding function based on u-law algorithm.
Abstract: Background noise is particularly damaging to speech
intelligibility for people with hearing loss especially for sensorineural
loss patients. Several investigations on speech intelligibility have
demonstrated sensorineural loss patients need 5-15 dB higher SNR
than the normal hearing subjects. This paper describes Discrete
Cosine Transform Power Normalized Least Mean Square algorithm
to improve the SNR and to reduce the convergence rate of the LMS
for Sensory neural loss patients. Since it requires only real arithmetic,
it establishes the faster convergence rate as compare to time domain
LMS and also this transformation improves the eigenvalue
distribution of the input autocorrelation matrix of the LMS filter.
The DCT has good ortho-normal, separable, and energy compaction
property. Although the DCT does not separate frequencies, it is a
powerful signal decorrelator. It is a real valued function and thus
can be effectively used in real-time operation. The advantages of
DCT-LMS as compared to standard LMS algorithm are shown via
SNR and eigenvalue ratio computations. . Exploiting the symmetry
of the basis functions, the DCT transform matrix [AN] can be
factored into a series of ±1 butterflies and rotation angles. This
factorization results in one of the fastest DCT implementation. There
are different ways to obtain factorizations. This work uses the fast
factored DCT algorithm developed by Chen and company. The
computer simulations results show superior convergence
characteristics of the proposed algorithm by improving the SNR at
least 10 dB for input SNR less than and equal to 0 dB, faster
convergence speed and better time and frequency characteristics.
Abstract: This work presents a fusion of Log Gabor Wavelet
(LGW) and Maximum a Posteriori (MAP) estimator as a speech
enhancement tool for acoustical background noise reduction. The
probability density function (pdf) of the speech spectral amplitude is
approximated by a Generalized Laplacian Distribution (GLD).
Compared to earlier estimators the proposed method estimates the
underlying statistical model more accurately by appropriately
choosing the model parameters of GLD. Experimental results show
that the proposed estimator yields a higher improvement in
Segmental Signal-to-Noise Ratio (S-SNR) and lower Log-Spectral
Distortion (LSD) in two different noisy environments compared to
other estimators.
Abstract: This paper presents the source extraction system which can extract only target signals with constraints on source localization in on-line systems. The proposed system is a kind of methods for enhancing a target signal and suppressing other interference signals. But, the performance of proposed system is superior to any other methods and the extraction of target source is comparatively complete. The method has a beamforming concept and uses an improved time-frequency (TF) mask-based BSS algorithm to separate a target signal from multiple noise sources. The target sources are assumed to be in front and test data was recorded in a reverberant room. The experimental results of the proposed method was evaluated by the PESQ score of real-recording sentences and showed a noticeable speech enhancement.