Abstract: People have the habitual pitch level which is used when people say something generally. However this pitch should be changed irregularly in the presence of noise. So it is useful to estimate SNR of speech signal by pitch. In this paper, we obtain the energy of input speech signal and then we detect a stationary region on voiced speech. And we get the pitch period by NAMDF for the stationary region that is not varied pitch rapidly. After getting pitch, each frame is divided by pitch period and the likelihood of closed pitch is estimated. In this paper, we proposed new parameter, NLF, to estimate the SNR of received speech signal. The NLF is derived from the correlation of near pitch periods. The NLF is obtained for each stationary region in voiced speech. Finally we confirmed good performance of the estimation of the SNR of received input speech in the presence of noise.
Abstract: In this paper, we use Radial Basis Function Networks
(RBFN) for solving the problem of environmental interference
cancellation of speech signal. We show that the Second Order Thin-
Plate Spline (SOTPS) kernel cancels the interferences effectively.
For make comparison, we test our experiments on two conventional
most used RBFN kernels: the Gaussian and First order TPS (FOTPS)
basis functions. The speech signals used here were taken from the
OGI Multi-Language Telephone Speech Corpus database and were
corrupted with six type of environmental noise from NOISEX-92
database. Experimental results show that the SOTPS kernel can
considerably outperform the Gaussian and FOTPS functions on
speech interference cancellation problem.
Abstract: This paper presents a new strategy of identification
and classification of pathological voices using the hybrid method
based on wavelet transform and neural networks. After speech
acquisition from a patient, the speech signal is analysed in order to
extract the acoustic parameters such as the pitch, the formants, Jitter,
and shimmer. Obtained results will be compared to those normal and
standard values thanks to a programmable database. Sounds are
collected from normal people and patients, and then classified into
two different categories. Speech data base is consists of several
pathological and normal voices collected from the national hospital
“Rabta-Tunis". Speech processing algorithm is conducted in a
supervised mode for discrimination of normal and pathology voices
and then for classification between neural and vocal pathologies
(Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation
results will be presented in function of the disease and will be
compared with the clinical diagnosis in order to have an objective
evaluation of the developed tool.
Abstract: One of the essential components of much of DSP
application is noise cancellation. Changes in real time signals are
quite rapid and swift. In noise cancellation, a reference signal which
is an approximation of noise signal (that corrupts the original
information signal) is obtained and then subtracted from the noise
bearing signal to obtain a noise free signal. This approximation of
noise signal is obtained through adaptive filters which are self
adjusting. As the changes in real time signals are abrupt, this needs
adaptive algorithm that converges fast and is stable. Least mean
square (LMS) and normalized LMS (NLMS) are two widely used
algorithms because of their plainness in calculations and
implementation. But their convergence rates are small. Adaptive
averaging filters (AFA) are also used because they have high
convergence, but they are less stable. This paper provides the
comparative study of LMS and Normalized NLMS, AFA and new
enhanced average adaptive (Average NLMS-ANLMS) filters for noise
cancelling application using speech signals.
Abstract: Different pseudo-random or pseudo-noise (PN) as well as orthogonal sequences that can be used as spreading codes for code division multiple access (CDMA) cellular networks or can be used for encrypting speech signals to reduce the residual intelligence are investigated. We briefly review the theoretical background for direct sequence CDMA systems and describe the main characteristics of the maximal length, Gold, Barker, and Kasami sequences. We also discuss about variable- and fixed-length orthogonal codes like Walsh- Hadamard codes. The equivalence of PN and orthogonal codes are also derived. Finally, a new PN sequence is proposed which is shown to have certain better properties than the existing codes.
Abstract: Real world Speaker Identification (SI) application
differs from ideal or laboratory conditions causing perturbations that
leads to a mismatch between the training and testing environment
and degrade the performance drastically. Many strategies have been
adopted to cope with acoustical degradation; wavelet based Bayesian
marginal model is one of them. But Bayesian marginal models
cannot model the inter-scale statistical dependencies of different
wavelet scales. Simple nonlinear estimators for wavelet based
denoising assume that the wavelet coefficients in different scales are
independent in nature. However wavelet coefficients have significant
inter-scale dependency. This paper enhances this inter-scale
dependency property by a Circularly Symmetric Probability Density
Function (CS-PDF) related to the family of Spherically Invariant
Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain
and corresponding joint shrinkage estimator is derived by Maximum
a Posteriori (MAP) estimator. A framework is proposed based on
these to denoise speech signal for automatic speaker identification
problems. The robustness of the proposed framework is tested for
Text Independent Speaker Identification application on 100 speakers
of POLYCOST and 100 speakers of YOHO speech database in three
different noise environments. Experimental results show that the
proposed estimator yields a higher improvement in identification
accuracy compared to other estimators on popular Gaussian Mixture
Model (GMM) based speaker model and Mel-Frequency Cepstral
Coefficient (MFCC) features.
Abstract: The acoustic and articulatory properties of fricative speech sounds are being studied using magnetic resonance imaging (MRI) and acoustic recordings from a single subject. Area functions were derived from a complete set of axial and coronal MR slices using two different methods: the Mermelstein technique and the Blum transform. Area functions derived from the two techniques were shown to differ significantly in some cases. Such differences will lead to different acoustic predictions and it is important to know which is the more accurate. The vocal tract acoustic transfer function (VTTF) was derived from these area functions for each fricative and compared with measured speech signals for the same fricative and same subject. The VTTFs for /f/ in two vowel contexts and the corresponding acoustic spectra are derived here; the Blum transform appears to show a better match between prediction and measurement than the Mermelstein technique.
Abstract: This paper studies the effect of different compression
constraints and schemes presented in a new and flexible paradigm to
achieve high compression ratios and acceptable signal to noise ratios
of Arabic speech signals. Compression parameters are computed for
variable frame sizes of a level 5 to 7 Discrete Wavelet Transform
(DWT) representation of the signals for different analyzing mother
wavelet functions. Results are obtained and compared for Global
threshold and level dependent threshold techniques. The results
obtained also include comparisons with Signal to Noise Ratios, Peak
Signal to Noise Ratios and Normalized Root Mean Square Error.
Abstract: Emotion recognition is an important research field that finds lots of applications nowadays. This work emphasizes on recognizing different emotions from speech signal. The extracted features are related to statistics of pitch, formants, and energy contours, as well as spectral, perceptual and temporal features, jitter, and shimmer. The Artificial Neural Networks (ANN) was chosen as the classifier. Working on finding a robust and fast ANN classifier suitable for different real life application is our concern. Several experiments were carried out on different ANN to investigate the different factors that impact the classification success rate. Using a database containing 7 different emotions, it will be shown that with a proper and careful adjustment of features format, training data sorting, number of features selected and even the ANN type and architecture used, a success rate of 85% or even more can be achieved without increasing the system complicity and the computation time
Abstract: Although the level crossing concept has been the subject of intensive investigation over the last few years, certain problems of great interest remain unsolved. One of these concern is distribution of threshold levels. This paper presents a new threshold level allocation schemes for level crossing based on nonuniform sampling. Intuitively, it is more reasonable if the information rich regions of the signal are sampled finer and those with sparse information are sampled coarser. To achieve this objective, we propose non-linear quantization functions which dynamically assign the number of quantization levels depending on the importance of the given amplitude range. Two new approaches to determine the importance of the given amplitude segment are presented. The proposed methods are based on exponential and logarithmic functions. Various aspects of proposed techniques are discussed and experimentally validated. Its efficacy is investigated by comparison with uniform sampling.
Abstract: In this paper, a new adaptive Fourier decomposition
(AFD) based time-frequency speech analysis approach is proposed.
Given the fact that the fundamental frequency of speech signals often
undergo fluctuation, the classical short-time Fourier transform (STFT)
based spectrogram analysis suffers from the difficulty of window size
selection. AFD is a newly developed signal decomposition theory. It is
designed to deal with time-varying non-stationary signals. Its
outstanding characteristic is to provide instantaneous frequency for
each decomposed component, so the time-frequency analysis becomes
easier. Experiments are conducted based on the sample sentence in
TIMIT Acoustic-Phonetic Continuous Speech Corpus. The results
show that the AFD based time-frequency distribution outperforms the
STFT based one.
Abstract: An algorithm for learning an overcomplete dictionary
using a Cauchy mixture model for sparse decomposition of an underdetermined
mixing system is introduced. The mixture density
function is derived from a ratio sample of the observed mixture
signals where 1) there are at least two but not necessarily more
mixture signals observed, 2) the source signals are statistically
independent and 3) the sources are sparse. The basis vectors of the
dictionary are learned via the optimization of the location parameters
of the Cauchy mixture components, which is shown to be more
accurate and robust than the conventional data mining methods
usually employed for this task. Using a well known sparse
decomposition algorithm, we extract three speech signals from two
mixtures based on the estimated dictionary. Further tests with
additive Gaussian noise are used to demonstrate the proposed
algorithm-s robustness to outliers.
Abstract: Despite the fact that Arabic language is currently one
of the most common languages worldwide, there has been only a
little research on Arabic speech recognition relative to other
languages such as English and Japanese. Generally, digital speech
processing and voice recognition algorithms are of special
importance for designing efficient, accurate, as well as fast automatic
speech recognition systems. However, the speech recognition process
carried out in this paper is divided into three stages as follows: firstly,
the signal is preprocessed to reduce noise effects. After that, the
signal is digitized and hearingized. Consequently, the voice activity
regions are segmented using voice activity detection (VAD)
algorithm. Secondly, features are extracted from the speech signal
using Mel-frequency cepstral coefficients (MFCC) algorithm.
Moreover, delta and acceleration (delta-delta) coefficients have been
added for the reason of improving the recognition accuracy. Finally,
each test word-s features are compared to the training database using
dynamic time warping (DTW) algorithm. Utilizing the best set up
made for all affected parameters to the aforementioned techniques,
the proposed system achieved a recognition rate of about 98.5%
which outperformed other HMM and ANN-based approaches
available in the literature.
Abstract: Revolutions Applications such as telecommunications, hands-free communications, recording, etc. which need at least one microphone, the signal is usually infected by noise and echo. The important application is the speech enhancement, which is done to remove suppressed noises and echoes taken by a microphone, beside preferred speech. Accordingly, the microphone signal has to be cleaned using digital signal processing DSP tools before it is played out, transmitted, or stored. Engineers have so far tried different approaches to improving the speech by get back the desired speech signal from the noisy observations. Especially Mobile communication, so in this paper will do reconstruction of the speech signal, observed in additive background noise, using the Kalman filter technique to estimate the parameters of the Autoregressive Process (AR) in the state space model and the output speech signal obtained by the MATLAB. The accurate estimation by Kalman filter on speech would enhance and reduce the noise then compare and discuss the results between actual values and estimated values which produce the reconstructed signals.
Abstract: This paper describes Independent Component Analysis (ICA) based fixed-point algorithm for the blind separation of the convolutive mixture of speech, picked-up by a linear microphone array. The proposed algorithm extracts independent sources by non- Gaussianizing the Time-Frequency Series of Speech (TFSS) in a deflationary way. The degree of non-Gaussianization is measured by negentropy. The relative performances of algorithm under random initialization and Null beamformer (NBF) based initialization are studied. It has been found that an NBF based initial value gives speedy convergence as well as better separation performance
Abstract: Vector quantization is a powerful tool for speech
coding applications. This paper deals with LPC Coding of speech
signals which uses a new technique called Multi Switched Split
Vector Quantization (MSSVQ), which is a hybrid of Multi, switched,
split vector quantization techniques. The spectral distortion
performance, computational complexity, and memory requirements
of MSSVQ are compared to split vector quantization (SVQ), multi
stage vector quantization(MSVQ) and switched split vector
quantization (SSVQ) techniques. It has been proved from results that
MSSVQ has better spectral distortion performance, lower
computational complexity and lower memory requirements when
compared to all the above mentioned product code vector
quantization techniques. Computational complexity is measured in
floating point operations (flops), and memory requirements is
measured in (floats).
Abstract: We analyze the effectivity of different pseudo noise (PN) and orthogonal sequences for encrypting speech signals in terms of perceptual intelligence. Speech signal can be viewed as sequence of correlated samples and each sample as sequence of bits. The residual intelligibility of the speech signal can be reduced by removing the correlation among the speech samples. PN sequences have random like properties that help in reducing the correlation among speech samples. The mean square aperiodic auto-correlation (MSAAC) and the mean square aperiodic cross-correlation (MSACC) measures are used to test the randomness of the PN sequences. Results of the investigation show the effectivity of large Kasami sequences for this purpose among many PN sequences.
Abstract: This paper is taken into consideration the problem of cryptanalysis of stream ciphers. There is some attempts need to improve the existing attacks on stream cipher and to make an attempt to distinguish the portions of cipher text obtained by the encryption of plain text in which some parts of the text are random and the rest are non-random. This paper presents a tutorial introduction to symmetric cryptography. The basic information theoretic and computational properties of classic and modern cryptographic systems are presented, followed by an examination of the application of cryptography to the security of VoIP system in computer networks using LFSR algorithm. The implementation program will be developed Java 2. LFSR algorithm is appropriate for the encryption and decryption of online streaming data, e.g. VoIP (voice chatting over IP). This paper is implemented the encryption module of speech signals to cipher text and decryption module of cipher text to speech signals.
Abstract: In this paper, an extended method of the directionally constrained minimization of power (DCMP) algorithm for broadband signals is proposed. The DCMP algorithm is one of the useful techniques of extracting a target signal from observed signals of a microphone array system. In the DCMP algorithm, output power of the microphone array is minimized under a constraint of constant responses to directions of arrival (DOAs) of specific signals. In our algorithm, by limiting the directional constraint to the perpendicular direction to the sensor array system, the calculating time is reduced.
Abstract: We present a novel scheme to recognize isolated speech
signals using certain statistical parameters derived from those signals.
The determination of the statistical estimates is based on extracted
signal information rather than the original signal information in
order to reduce the computational complexity. Subtle details of
these estimates, after extracting the speech signal from ambience
noise, are first exploited to segregate the polysyllabic words from
the monosyllabic ones. Precise recognition of each distinct word is
then carried out by analyzing the histogram, obtained from these
information.