Abstract: The paper presents a complete discrete statistical framework, based on a novel vector quantization (VQ) front-end process. This new VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique that we named the distributed vector quantization (DVQ) of hidden Markov models, succeeds in unifying acoustic micro-structure and phonetic macro-structure, when the estimation of HMM parameters is performed. The DVQ technique is implemented through two variants. The first variant uses the K-means algorithm (K-means- DVQ) to optimize the VQ, while the second variant exploits the benefits of the classification behavior of neural networks (NN-DVQ) for the same purpose. The proposed variants are compared with the HMM-based baseline system by experiments of specific Arabic consonants recognition. The results show that the distributed vector quantization technique increase the performance of the discrete HMM system.
Abstract: The paper presents an on-line recognition machine
(RM) for continuous/isolated, dynamic and static gestures that arise
in Flight Deck Officer (FDO) training. RM is based on generic pattern
recognition framework. Gestures are represented as templates using
summary statistics. The proposed recognition algorithm exploits temporal
and spatial characteristics of gestures via dynamic programming
and Markovian process. The algorithm predicts corresponding index
of incremental input data in the templates in an on-line mode.
Accumulated consistency in the sequence of prediction provides a
similarity measurement (Score) between input data and the templates.
The algorithm provides an intuitive mechanism for automatic detection
of start/end frames of continuous gestures. In the present paper,
we consider isolated gestures. The performance of RM is evaluated
using four datasets - artificial (W TTest), hand motion (Yang) and
FDO (tracker, vision-based ). RM achieves comparable results which
are in agreement with other on-line and off-line algorithms such as
hidden Markov model (HMM) and dynamic time warping (DTW).
The proposed algorithm has the additional advantage of providing
timely feedback for training purposes.
Abstract: In this paper, a novel method for recognition of musical
instruments in a polyphonic music is presented by using an
embedded hidden Markov model (EHMM). EHMM is a doubly
embedded HMM structure where each state of the external HMM
is an independent HMM. The classification is accomplished for
two different internal HMM structures where GMMs are used as
likelihood estimators for the internal HMMs. The results are compared
to those achieved by an artificial neural network with two
hidden layers. Appropriate classification accuracies were achieved
both for solo instrument performance and instrument combinations
which demonstrates that the new approach outperforms the similar
classification methods by means of the dynamic of the signal.
Abstract: In this study, the use of silicon NAM (Non-Audible
Murmur) microphone in automatic speech recognition is presented.
NAM microphones are special acoustic sensors, which are attached
behind the talker-s ear and can capture not only normal (audible)
speech, but also very quietly uttered speech (non-audible murmur).
As a result, NAM microphones can be applied in automatic speech
recognition systems when privacy is desired in human-machine communication.
Moreover, NAM microphones show robustness against
noise and they might be used in special systems (speech recognition,
speech conversion etc.) for sound-impaired people. Using a small
amount of training data and adaptation approaches, 93.9% word
accuracy was achieved for a 20k Japanese vocabulary dictation
task. Non-audible murmur recognition in noisy environments is also
investigated. In this study, further analysis of the NAM speech has
been made using distance measures between hidden Markov model
(HMM) pairs. It has been shown the reduced spectral space of NAM
speech using a metric distance, however the location of the different
phonemes of NAM are similar to the location of the phonemes
of normal speech, and the NAM sounds are well discriminated.
Promising results in using nonlinear features are also introduced,
especially under noisy conditions.
Abstract: Hand gesture is an active area of research in the vision
community, mainly for the purpose of sign language recognition and
Human Computer Interaction. In this paper, we propose a system to
recognize alphabet characters (A-Z) and numbers (0-9) in real-time
from stereo color image sequences using Hidden Markov Models
(HMMs). Our system is based on three main stages; automatic segmentation
and preprocessing of the hand regions, feature extraction
and classification. In automatic segmentation and preprocessing stage,
color and 3D depth map are used to detect hands where the hand
trajectory will take place in further step using Mean-shift algorithm
and Kalman filter. In the feature extraction stage, 3D combined features
of location, orientation and velocity with respected to Cartesian
systems are used. And then, k-means clustering is employed for
HMMs codeword. The final stage so-called classification, Baum-
Welch algorithm is used to do a full train for HMMs parameters.
The gesture of alphabets and numbers is recognized using Left-Right
Banded model in conjunction with Viterbi algorithm. Experimental
results demonstrate that, our system can successfully recognize hand
gestures with 98.33% recognition rate.
Abstract: Gesture recognition is a challenging task for extracting
meaningful gesture from continuous hand motion. In this paper, we propose an automatic system that recognizes isolated gesture,
in addition meaningful gesture from continuous hand motion for Arabic numbers from 0 to 9 in real-time based on Hidden Markov Models (HMM). In order to handle isolated gesture, HMM using
Ergodic, Left-Right (LR) and Left-Right Banded (LRB) topologies is applied over the discrete vector feature that is extracted from stereo
color image sequences. These topologies are considered to different
number of states ranging from 3 to 10. A new system is developed to recognize the meaningful gesture based on zero-codeword detection
with static velocity motion for continuous gesture. Therefore, the
LRB topology in conjunction with Baum-Welch (BW) algorithm for
training and forward algorithm with Viterbi path for testing presents the best performance. Experimental results show that the proposed system can successfully recognize isolated and meaningful gesture and achieve average rate recognition 98.6% and 94.29% respectively.
Abstract: This paper presents a technical speaker adaptation
method called WMLLR, which is based on maximum likelihood linear
regression (MLLR). In MLLR, a linear regression-based transform
which adapted the HMM mean vectors was calculated to maximize the
likelihood of adaptation data. In this paper, the prior knowledge of the
initial model is adequately incorporated into the adaptation. A series of
speaker adaptation experiments are carried out at a 30 famous city
names database to investigate the efficiency of the proposed method.
Experimental results show that the WMLLR method outperforms the
conventional MLLR method, especially when only few utterances
from a new speaker are available for adaptation.
Abstract: Current image-based individual human recognition
methods, such as fingerprints, face, or iris biometric modalities
generally require a cooperative subject, views from certain aspects,
and physical contact or close proximity. These methods cannot
reliably recognize non-cooperating individuals at a distance in the
real world under changing environmental conditions. Gait, which
concerns recognizing individuals by the way they walk, is a relatively
new biometric without these disadvantages. The inherent gait
characteristic of an individual makes it irreplaceable and useful in
visual surveillance.
In this paper, an efficient gait recognition system for human
identification by extracting two features namely width vector of
the binary silhouette and the MPEG-7-based region-based shape
descriptors is proposed. In the proposed method, foreground objects
i.e., human and other moving objects are extracted by estimating
background information by a Gaussian Mixture Model (GMM) and
subsequently, median filtering operation is performed for removing
noises in the background subtracted image. A moving target classification
algorithm is used to separate human being (i.e., pedestrian)
from other foreground objects (viz., vehicles). Shape and boundary
information is used in the moving target classification algorithm.
Subsequently, width vector of the outer contour of binary silhouette
and the MPEG-7 Angular Radial Transform coefficients are taken as
the feature vector. Next, the Principal Component Analysis (PCA)
is applied to the selected feature vector to reduce its dimensionality.
These extracted feature vectors are used to train an Hidden Markov
Model (HMM) for identification of some individuals. The proposed
system is evaluated using some gait sequences and the experimental
results show the efficacy of the proposed algorithm.
Abstract: In this paper we present an efficient system for
independent speaker speech recognition based on neural network
approach. The proposed architecture comprises two phases: a
preprocessing phase which consists in segmental normalization and
features extraction and a classification phase which uses neural
networks based on nonparametric density estimation namely the
general regression neural network (GRNN). The relative
performances of the proposed model are compared to the similar
recognition systems based on the Multilayer Perceptron (MLP), the
Recurrent Neural Network (RNN) and the well known Discrete
Hidden Markov Model (HMM-VQ) that we have achieved also.
Experimental results obtained with Arabic digits have shown that the
use of nonparametric density estimation with an appropriate
smoothing factor (spread) improves the generalization power of the
neural network. The word error rate (WER) is reduced significantly
over the baseline HMM method. GRNN computation is a successful
alternative to the other neural network and DHMM.