Abstract: A simple adaptive voice activity detector (VAD) is
implemented using Gabor and gammatone atomic decomposition of
speech for high Gaussian noise environments. Matching pursuit is
used for atomic decomposition, and is shown to achieve optimal
speech detection capability at high data compression rates for low
signal to noise ratios. The most active dictionary elements found by
matching pursuit are used for the signal reconstruction so that the
algorithm adapts to the individual speakers dominant time-frequency
characteristics. Speech has a high peak to average ratio enabling
matching pursuit greedy heuristic of highest inner products to isolate
high energy speech components in high noise environments. Gabor
and gammatone atoms are both investigated with identical
logarithmically spaced center frequencies, and similar bandwidths.
The algorithm performs equally well for both Gabor and gammatone
atoms with no significant statistical differences. The algorithm
achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR
and 98% accuracy at a 20dB SNR using 30d B SNR as a reference
for voice activity.
Abstract: Independent component analysis can estimate unknown
source signals from their mixtures under the assumption that the
source signals are statistically independent. However, in a real environment,
the separation performance is often deteriorated because
the number of the source signals is different from that of the sensors.
In this paper, we propose an estimation method for the number of
the sources based on the joint distribution of the observed signals
under two-sensor configuration. From several simulation results, it
is found that the number of the sources is coincident to that of
peaks in the histogram of the distribution. The proposed method can
estimate the number of the sources even if it is larger than that of
the observed signals. The proposed methods have been verified by
several experiments.
Abstract: A combined three-microphone voice activity detector (VAD) and noise-canceling system is studied to enhance speech recognition in an automobile environment. A previous experiment clearly shows the ability of the composite system to cancel a single noise source outside of a defined zone. This paper investigates the performance of the composite system when there are frequently moving noise sources (noise sources are coming from different locations but are not always presented at the same time) e.g. there is other passenger speech or speech from a radio when a desired speech is presented. To work in a frequently moving noise sources environment, whilst a three-microphone voice activity detector (VAD) detects voice from a “VAD valid zone", the 3-microphone noise canceller uses a “noise canceller valid zone" defined in freespace around the users head. Therefore, a desired voice should be in the intersection of the noise canceller valid zone and VAD valid zone. Thus all noise is suppressed outside this intersection of area. Experiments are shown for a real environment e.g. all results were recorded in a car by omni-directional electret condenser microphones.
Abstract: In this paper we present a statistical analysis of Voice
over IP (VoIP) packet streams produced by the G.711 voice coder
with voice activity detection (VAD). During telephone conversation,
depending whether the interlocutor speaks (ON) or remains silent
(OFF), packets are produced or not by a voice coder. As index of
dispersion for both ON and OFF times distribution was greater than
one, we used hyperexponential distribution for approximation of
streams duration. For each stage of the hyperexponential distribution,
we tested goodness of our fits using graphical methods, we calculated
estimation errors, and performed Kolmogorov-Smirnov test.
Obtained results showed that the precise VoIP source model can be
based on the five-state Markov process.
Abstract: Despite the fact that Arabic language is currently one
of the most common languages worldwide, there has been only a
little research on Arabic speech recognition relative to other
languages such as English and Japanese. Generally, digital speech
processing and voice recognition algorithms are of special
importance for designing efficient, accurate, as well as fast automatic
speech recognition systems. However, the speech recognition process
carried out in this paper is divided into three stages as follows: firstly,
the signal is preprocessed to reduce noise effects. After that, the
signal is digitized and hearingized. Consequently, the voice activity
regions are segmented using voice activity detection (VAD)
algorithm. Secondly, features are extracted from the speech signal
using Mel-frequency cepstral coefficients (MFCC) algorithm.
Moreover, delta and acceleration (delta-delta) coefficients have been
added for the reason of improving the recognition accuracy. Finally,
each test word-s features are compared to the training database using
dynamic time warping (DTW) algorithm. Utilizing the best set up
made for all affected parameters to the aforementioned techniques,
the proposed system achieved a recognition rate of about 98.5%
which outperformed other HMM and ANN-based approaches
available in the literature.