Abstract: This paper presents various classifiers results from a system that can automatically recognize four different static human body postures in video sequences. The considered postures are standing, sitting, squatting, and lying. The three classifiers considered are a naïve one and two based on the belief theory. The belief theory-based classifiers use either a classic or restricted plausibility criterion to make a decision after data fusion. The data come from the people 2D segmentation and from their face localization. Measurements consist in distances relative to a reference posture. The efficiency and the limits of the different classifiers on the recognition system are highlighted thanks to the analysis of a great number of results. This system allows real-time processing.
Abstract: This paper proposes a novel system for monitoring the
health of underground pipelines. Some of these pipelines transport
dangerous contents and any damage incurred might have catastrophic
consequences. However, most of these damage are unintentional and
usually a result of surrounding construction activities. In order to
prevent these potential damages, monitoring systems are
indispensable. This paper focuses on acoustically recognizing road
cutters since they prelude most construction activities in modern
cities. Acoustic recognition can be easily achieved by installing a
distributed computing sensor network along the pipelines and using
smart sensors to “listen" for potential threat; if there is a real threat,
raise some form of alarm. For efficient pipeline monitoring, a novel
monitoring approach is proposed. Principal Component Analysis
(PCA) was studied and applied. Eigenvalues were regarded as the
special signature that could characterize a sound sample, and were
thus used for the feature vector for sound recognition. The denoising
ability of PCA could make it robust to noise interference. One class
SVM was used for classifier. On-site experiment results show that the
proposed PCA and SVM based acoustic recognition system will be
very effective with a low tendency for raising false alarms.
Abstract: Mel Frequency Cepstral Coefficient (MFCC) features
are widely used as acoustic features for speech recognition as well
as speaker recognition. In MFCC feature representation, the Mel frequency
scale is used to get a high resolution in low frequency region,
and a low resolution in high frequency region. This kind of processing
is good for obtaining stable phonetic information, but not suitable
for speaker features that are located in high frequency regions. The
speaker individual information, which is non-uniformly distributed
in the high frequencies, is equally important for speaker recognition.
Based on this fact we proposed an admissible wavelet packet based
filter structure for speaker identification. Multiresolution capabilities
of wavelet packet transform are used to derive the new features.
The proposed scheme differs from previous wavelet based works,
mainly in designing the filter structure. Unlike others, the proposed
filter structure does not follow Mel scale. The closed-set speaker
identification experiments performed on the TIMIT database shows
improved identification performance compared to other commonly
used Mel scale based filter structures using wavelets.
Abstract: Years of extensive research in the field of speech
processing for compression and recognition in the last five decades,
resulted in a severe competition among the various methods and
paradigms introduced. In this paper we include the different representations
of speech in the time-frequency and time-scale domains
for the purpose of compression and recognition. The examination of
these representations in a variety of related work is accomplished.
In particular, we emphasize methods related to Fourier analysis
paradigms and wavelet based ones along with the advantages and
disadvantages of both approaches.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.
Abstract: Face Recognition is a field of multidimensional
applications. A lot of work has been done, extensively on the most of
details related to face recognition. This idea of face recognition using
PCA is one of them. In this paper the PCA features for Feature
extraction are used and matching is done for the face under
consideration with the test image using Eigen face coefficients. The
crux of the work lies in optimizing Euclidean distance and paving the
way to test the same algorithm using Matlab which is an efficient tool
having powerful user interface along with simplicity in representing
complex images.
Abstract: ICA which is generally used for blind source separation
problem has been tested for feature extraction in Speech recognition
system to replace the phoneme based approach of MFCC. Applying
the Cepstral coefficients generated to ICA as preprocessing has
developed a new signal processing approach. This gives much better
results against MFCC and ICA separately, both for word and speaker
recognition. The mixing matrix A is different before and after MFCC
as expected. As Mel is a nonlinear scale. However, cepstrals
generated from Linear Predictive Coefficient being independent
prove to be the right candidate for ICA. Matlab is the tool used for
all comparisons. The database used is samples of ISOLET.
Abstract: The use of High Order Statistics (HOS) analysis is
expected to provide so many candidates of features that can be selected for pattern recognition. More candidates of the feature can
be extracted using simple manipulation through a specific mathematical function prior to the HOS analysis. Feature extraction
method using HOS analysis combined with Difference to the Nth-Power manipulation has been examined in application for Automatic
Modulation Recognition (AMR) to perform scheme recognition of three digital modulation signal, i.e. QPSK-16QAM-64QAM in the
AWGN transmission channel. The simulation results is reported
when the analysis of HOS up to order-12 and the manipulation of Difference to the Nth-Power up to N = 4. The obtained accuracy rate
of AMR using the method of Simple Decision obtained 90% in SNR > 10 dB in its classifier, while using the method of Voted Decision is
96% in SNR > 2 dB.
Abstract: The paper presents a complete discrete statistical framework, based on a novel vector quantization (VQ) front-end process. This new VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique that we named the distributed vector quantization (DVQ) of hidden Markov models, succeeds in unifying acoustic micro-structure and phonetic macro-structure, when the estimation of HMM parameters is performed. The DVQ technique is implemented through two variants. The first variant uses the K-means algorithm (K-means- DVQ) to optimize the VQ, while the second variant exploits the benefits of the classification behavior of neural networks (NN-DVQ) for the same purpose. The proposed variants are compared with the HMM-based baseline system by experiments of specific Arabic consonants recognition. The results show that the distributed vector quantization technique increase the performance of the discrete HMM system.
Abstract: The temporal nature of negative selection is an under exploited area. In a negative selection system, newly generated antibodies go through a maturing phase, and the survivors of the phase then wait to be activated by the incoming antigens after certain number of matches. These without having enough matches will age and die, while these with enough matches (i.e., being activated) will become active detectors. A currently active detector may also age and die if it cannot find any match in a pre-defined (lengthy) period of time. Therefore, what matters in a negative selection system is the dynamics of the involved parties in the current time window, not the whole time duration, which may be up to eternity. This property has the potential to define the uniqueness of negative selection in comparison with the other approaches. On the other hand, a negative selection system is only trained with “normal" data samples. It has to learn and discover unknown “abnormal" data patterns on the fly by itself. Consequently, it is more appreciate to utilize negation selection as a system for pattern discovery and recognition rather than just pattern recognition. In this paper, we study the potential of using negative selection in discovering unknown temporal patterns.
Abstract: The proposed system identifies the species of the wood
using the textural features present in its barks. Each species of a wood
has its own unique patterns in its bark, which enabled the proposed
system to identify it accurately. Automatic wood recognition system
has not yet been well established mainly due to lack of research in this
area and the difficulty in obtaining the wood database. In our work, a
wood recognition system has been designed based on pre-processing
techniques, feature extraction and by correlating the features of those
wood species for their classification. Texture classification is a problem
that has been studied and tested using different methods due to its
valuable usage in various pattern recognition problems, such as wood
recognition, rock classification. The most popular technique used
for the textural classification is Gray-level Co-occurrence Matrices
(GLCM). The features from the enhanced images are thus extracted
using the GLCM is correlated, which determines the classification
between the various wood species. The result thus obtained shows a
high rate of recognition accuracy proving that the techniques used in
suitable to be implemented for commercial purposes.
Abstract: The paper discusses the mathematics of pattern
indexing and its applications to recognition of visual patterns that are
found in video clips. It is shown that (a) pattern indexes can be
represented by collections of inverted patterns, (b) solutions to
pattern classification problems can be found as intersections and
histograms of inverted patterns and, thus, matching of original
patterns avoided.
Abstract: One of the main image representations in Mathematical Morphology is the 3D Shape Decomposition Representation, useful for Image Compression and Representation,and Pattern Recognition. The 3D Morphological Shape Decomposition representation can be generalized a number of times,to extend the scope of its algebraic characteristics as much as possible. With these generalizations, the Morphological Shape Decomposition 's role to serve as an efficient image decomposition tool is extended to grayscale images.This work follows the above line, and further develops it. Anew evolutionary branch is added to the 3D Morphological Shape Decomposition's development, by the introduction of a 3D Multi Structuring Element Morphological Shape Decomposition, which permits 3D Morphological Shape Decomposition of 3D binary images (grayscale images) into "multiparameter" families of elements. At the beginning, 3D Morphological Shape Decomposition representations are based only on "1 parameter" families of elements for image decomposition.This paper addresses the gray scale inter frame interpolation by means of mathematical morphology. The new interframe interpolation method is based on generalized morphological 3D Shape Decomposition. This article will present the theoretical background of the morphological interframe interpolation, deduce the new representation and show some application examples.Computer simulations could illustrate results.
Abstract: Local Linear Neuro-Fuzzy Models (LLNFM) like other neuro- fuzzy systems are adaptive networks and provide robust learning capabilities and are widely utilized in various applications such as pattern recognition, system identification, image processing and prediction. Local linear model tree (LOLIMOT) is a type of Takagi-Sugeno-Kang neuro fuzzy algorithm which has proven its efficiency compared with other neuro fuzzy networks in learning the nonlinear systems and pattern recognition. In this paper, a dedicated reconfigurable and parallel processing hardware for LOLIMOT algorithm and its applications are presented. This hardware realizes on-chip learning which gives it the capability to work as a standalone device in a system. The synthesis results on FPGA platforms show its potential to improve the speed at least 250 of times faster than software implemented algorithms.
Abstract: Advancement in Artificial Intelligence has lead to the
developments of various “smart" devices. Character recognition
device is one of such smart devices that acquire partial human
intelligence with the ability to capture and recognize various
characters in different languages. Firstly multiscale neural training
with modifications in the input training vectors is adopted in this
paper to acquire its advantage in training higher resolution character
images. Secondly selective thresholding using minimum distance
technique is proposed to be used to increase the level of accuracy of
character recognition. A simulator program (a GUI) is designed in
such a way that the characters can be located on any spot on the
blank paper in which the characters are written. The results show that
such methods with moderate level of training epochs can produce
accuracies of at least 85% and more for handwritten upper case
English characters and numerals.
Abstract: In digital signal processing it is important to
approximate multi-dimensional data by the method called rank
reduction, in which we reduce the rank of multi-dimensional data from
higher to lower. For 2-dimennsional data, singular value
decomposition (SVD) is one of the most known rank reduction
techniques. Additional, outer product expansion expanded from SVD
was proposed and implemented for multi-dimensional data, which has
been widely applied to image processing and pattern recognition.
However, the multi-dimensional outer product expansion has behavior
of great computation complex and has not orthogonally between the
expansion terms. Therefore we have proposed an alterative method,
Third-order Orthogonal Tensor Product Expansion short for 3-OTPE.
3-OTPE uses the power method instead of nonlinear optimization
method for decreasing at computing time. At the same time the group
of B. D. Lathauwer proposed Higher-Order SVD (HOSVD) that is
also developed with SVD extensions for multi-dimensional data.
3-OTPE and HOSVD are similarly on the rank reduction of
multi-dimensional data. Using these two methods we can obtain
computation results respectively, some ones are the same while some
ones are slight different. In this paper, we compare 3-OTPE to
HOSVD in accuracy of calculation and computing time of resolution,
and clarify the difference between these two methods.
Abstract: One major source of performance decline in speaker
recognition system is channel mismatch between training and testing.
This paper focuses on improving channel robustness of speaker
recognition system in two aspects of channel compensation technique
and channel robust features. The system is text-independent speaker
identification system based on two-stage recognition. In the aspect of
channel compensation technique, this paper applies MAP (Maximum
A Posterior Probability) channel compensation technique, which was
used in speech recognition, to speaker recognition system. In the
aspect of channel robust features, this paper introduces
pitch-dependent features and pitch-dependent speaker model for the
second stage recognition. Based on the first stage recognition to
testing speech using GMM (Gaussian Mixture Model), the system
uses GMM scores to decide if it needs to be recognized again. If it
needs to, the system selects a few speakers from all of the speakers
who participate in the first stage recognition for the second stage
recognition. For each selected speaker, the system obtains 3
pitch-dependent results from his pitch-dependent speaker model, and
then uses ANN (Artificial Neural Network) to unite the 3
pitch-dependent results and 1 GMM score for getting a fused result.
The system makes the second stage recognition based on these fused
results. The experiments show that the correct rate of two-stage
recognition system based on MAP channel compensation technique
and pitch-dependent features is 41.7% better than the baseline system
for closed-set test.
Abstract: This paper presents a recognition system for isolated
words like robot commands. It’s carried out by Time Delay Neural
Networks; TDNN. To teleoperate a robot for specific tasks as turn,
close, etc… In industrial environment and taking into account the
noise coming from the machine. The choice of TDNN is based on its
generalization in terms of accuracy, in more it acts as a filter that
allows the passage of certain desirable frequency characteristics of
speech; the goal is to determine the parameters of this filter for
making an adaptable system to the variability of speech signal and to
noise especially, for this the back propagation technique was used in
learning phase. The approach was applied on commands pronounced
in two languages separately: The French and Arabic. The results for
two test bases of 300 spoken words for each one are 87%, 97.6% in
neutral environment and 77.67%, 92.67% when the white Gaussian
noisy was added with a SNR of 35 dB.
Abstract: Hand gesture is an active area of research in the vision
community, mainly for the purpose of sign language recognition and
Human Computer Interaction. In this paper, we propose a system to
recognize alphabet characters (A-Z) and numbers (0-9) in real-time
from stereo color image sequences using Hidden Markov Models
(HMMs). Our system is based on three main stages; automatic segmentation
and preprocessing of the hand regions, feature extraction
and classification. In automatic segmentation and preprocessing stage,
color and 3D depth map are used to detect hands where the hand
trajectory will take place in further step using Mean-shift algorithm
and Kalman filter. In the feature extraction stage, 3D combined features
of location, orientation and velocity with respected to Cartesian
systems are used. And then, k-means clustering is employed for
HMMs codeword. The final stage so-called classification, Baum-
Welch algorithm is used to do a full train for HMMs parameters.
The gesture of alphabets and numbers is recognized using Left-Right
Banded model in conjunction with Viterbi algorithm. Experimental
results demonstrate that, our system can successfully recognize hand
gestures with 98.33% recognition rate.
Abstract: In this paper, a novel algorithm based on Ridgelet
Transform and support vector machine is proposed for human action
recognition. The Ridgelet transform is a directional multi-resolution
transform and it is more suitable for describing the human action by
performing its directional information to form spatial features
vectors. The dynamic transition between the spatial features is carried
out using both the Principal Component Analysis and clustering
algorithm K-means. First, the Principal Component Analysis is used
to reduce the dimensionality of the obtained vectors. Then, the kmeans
algorithm is then used to perform the obtained vectors to form
the spatio-temporal pattern, called set-of-labels, according to given
periodicity of human action. Finally, a Support Machine classifier is
used to discriminate between the different human actions. Different
tests are conducted on popular Datasets, such as Weizmann and
KTH. The obtained results show that the proposed method provides
more significant accuracy rate and it drives more robustness in very
challenging situations such as lighting changes, scaling and dynamic
environment