Abstract: Distant-talking voice-based HCI system suffers from
performance degradation due to mismatch between the acoustic
speech (runtime) and the acoustic model (training). Mismatch is
caused by the change in the power of the speech signal as observed at
the microphones. This change is greatly influenced by the change in
distance, affecting speech dynamics inside the room before reaching
the microphones. Moreover, as the speech signal is reflected, its
acoustical characteristic is also altered by the room properties. In
general, power mismatch due to distance is a complex problem. This
paper presents a novel approach in dealing with distance-induced
mismatch by intelligently sensing instantaneous voice power variation
and compensating model parameters. First, the distant-talking speech
signal is processed through microphone array processing, and the
corresponding distance information is extracted. Distance-sensitive
Gaussian Mixture Models (GMMs), pre-trained to capture both
speech power and room property are used to predict the optimal
distance of the speech source. Consequently, pre-computed statistic
priors corresponding to the optimal distance is selected to correct
the statistics of the generic model which was frozen during training.
Thus, model combinatorics are post-conditioned to match the power
of instantaneous speech acoustics at runtime. This results to an
improved likelihood in predicting the correct speech command at
farther distances. We experiment using real data recorded inside two
rooms. Experimental evaluation shows voice recognition performance
using our method is more robust to the change in distance compared
to the conventional approach. In our experiment, under the most
acoustically challenging environment (i.e., Room 2: 2.5 meters), our
method achieved 24.2% improvement in recognition performance
against the best-performing conventional method.
Abstract: Nowadays, with the emerging of the new applications
like robot control in image processing, artificial vision for visual
servoing is a rapidly growing discipline and Human-machine
interaction plays a significant role for controlling the robot. This
paper presents a new algorithm based on spatio-temporal volumes for
visual servoing aims to control robots. In this algorithm, after
applying necessary pre-processing on video frames, a spatio-temporal
volume is constructed for each gesture and feature vector is extracted.
These volumes are then analyzed for matching in two consecutive
stages. For hand gesture recognition and classification we tested
different classifiers including k-Nearest neighbor, learning vector
quantization and back propagation neural networks. We tested the
proposed algorithm with the collected data set and results showed the
correct gesture recognition rate of 99.58 percent. We also tested the
algorithm with noisy images and algorithm showed the correct
recognition rate of 97.92 percent in noisy images.
Abstract: Several works regarding facial recognition have dealt with methods which identify isolated characteristics of the face or with templates which encompass several regions of it. In this paper a new technique which approaches the problem holistically dispensing with the need to identify geometrical characteristics or regions of the face is introduced. The characterization of a face is achieved by randomly sampling selected attributes of the pixels of its image. From this information we construct a set of data, which correspond to the values of low frequencies, gradient, entropy and another several characteristics of pixel of the image. Generating a set of “p" variables. The multivariate data set with different polynomials minimizing the data fitness error in the minimax sense (L∞ - Norm) is approximated. With the use of a Genetic Algorithm (GA) it is able to circumvent the problem of dimensionality inherent to higher degree polynomial approximations. The GA yields the degree and values of a set of coefficients of the polynomials approximating of the image of a face. By finding a family of characteristic polynomials from several variables (pixel characteristics) for each face (say Fi ) in the data base through a resampling process the system in use, is trained. A face (say F ) is recognized by finding its characteristic polynomials and using an AdaBoost Classifier from F -s polynomials to each of the Fi -s polynomials. The winner is the polynomial family closer to F -s corresponding to target face in data base.
Abstract: Three-dimensional reconstruction of small objects has
been one of the most challenging problems over the last decade.
Computer graphics researchers and photography professionals have
been working on improving 3D reconstruction algorithms to fit the
high demands of various real life applications. Medical sciences,
animation industry, virtual reality, pattern recognition, tourism
industry, and reverse engineering are common fields where 3D
reconstruction of objects plays a vital role. Both lack of accuracy and
high computational cost are the major challenges facing successful
3D reconstruction. Fringe projection has emerged as a promising 3D
reconstruction direction that combines low computational cost to both
high precision and high resolution. It employs digital projection,
structured light systems and phase analysis on fringed pictures.
Research studies have shown that the system has acceptable
performance, and moreover it is insensitive to ambient light.
This paper presents an overview of fringe projection approaches. It
also presents an experimental study and implementation of a simple
fringe projection system. We tested our system using two objects
with different materials and levels of details. Experimental results
have shown that, while our system is simple, it produces acceptable
results.
Abstract: Face detection and recognition has many applications
in a variety of fields such as security system, videoconferencing and
identification. Face classification is currently implemented in
software. A hardware implementation allows real-time processing,
but has higher cost and time to-market.
The objective of this work is to implement a classifier based on
neural networks MLP (Multi-layer Perceptron) for face detection.
The MLP is used to classify face and non-face patterns. The systm is
described using C language on a P4 (2.4 Ghz) to extract weight
values. Then a Hardware implementation is achieved using VHDL
based Methodology. We target Xilinx FPGA as the implementation
support.
Abstract: There have been significant improvements in automatic
voice recognition technology. However, existing systems still face difficulties,
particularly when used by non-native speakers with accents.
In this paper we address a problem of identifying the English accented
speech of speakers from different backgrounds. Once an accent is
identified the speech recognition software can utilise training set from
appropriate accent and therefore improve the efficiency and accuracy
of the speech recognition system. We introduced the Q factor, which
is defined by the sum of relationships between frequencies of the
formants. Four different accents were considered and experimented
for this research. A scoring method was introduced in order to
effectively analyse accents. The proposed concept indicates that the
accent could be identified by analysing their formants.
Abstract: Character segmentation is an important preprocessing
step for text recognition. In degraded documents, existence of
touching characters decreases recognition rate drastically, for any
optical character recognition (OCR) system. In this paper we have
proposed a complete solution for segmenting touching characters in
all the three zones of printed Gurmukhi script. A study of touching
Gurmukhi characters is carried out and these characters have been
divided into various categories after a careful analysis. Structural
properties of the Gurmukhi characters are used for defining the
categories. New algorithms have been proposed to segment the
touching characters in middle zone, upper zone and lower zone.
These algorithms have shown a reasonable improvement in
segmenting the touching characters in degraded printed Gurmukhi
script. The algorithms proposed in this paper are applicable only to
machine printed text. We have also discussed a new and useful
technique to segment the horizontally overlapping lines.
Abstract: Although lots of research work has been done for
human pose recognition, the view-point of cameras is still critical
problem of overall recognition system. In this paper, view-point
insensitive human pose recognition is proposed. The aims of the
proposed system are view-point insensitivity and real-time processing.
Recognition system consists of feature extraction module, neural
network and real-time feed forward calculation. First, histogram-based
method is used to extract feature from silhouette image and it is
suitable for represent the shape of human pose. To reduce the
dimension of feature vector, Principle Component Analysis(PCA) is
used. Second, real-time processing is implemented by using Compute
Unified Device Architecture(CUDA) and this architecture improves
the speed of feed-forward calculation of neural network. We
demonstrate the effectiveness of our approach with experiments on
real environment.
Abstract: To increase reliability of face recognition system, the
system must be able to distinguish real face from a copy of face such
as a photograph. In this paper, we propose a fast and memory efficient
method of live face detection for embedded face recognition system,
based on the analysis of the movement of the eyes. We detect eyes in
sequential input images and calculate variation of each eye region to
determine whether the input face is a real face or not. Experimental
results show that the proposed approach is competitive and promising
for live face detection.
Abstract: This manuscript presents, palmprint recognition by
combining different texture extraction approaches with high accuracy.
The Region of Interest (ROI) is decomposed into different frequencytime
sub-bands by wavelet transform up-to two levels and only the
approximate image of two levels is selected, which is known as
Approximate Image ROI (AIROI). This AIROI has information of
principal lines of the palm. The Competitive Index is used as the
features of the palmprint, in which six Gabor filters of different
orientations convolve with the palmprint image to extract the orientation
information from the image. The winner-take-all strategy
is used to select dominant orientation for each pixel, which is
known as Competitive Index. Further, PCA is applied to select highly
uncorrelated Competitive Index features, to reduce the dimensions of
the feature vector, and to project the features on Eigen space. The
similarity of two palmprints is measured by the Euclidean distance
metrics. The algorithm is tested on Hong Kong PolyU palmprint
database. Different AIROI of different wavelet filter families are also
tested with the Competitive Index and PCA. AIROI of db7 wavelet
filter achievs Equal Error Rate (EER) of 0.0152% and Genuine
Acceptance Rate (GAR) of 99.67% on the palm database of Hong
Kong PolyU.
Abstract: Nowadays, web-based technologies influence in
people-s daily life such as in education, business and others.
Therefore, many web developers are too eager to develop their web
applications with fully animation graphics and forgetting its
accessibility to its users. Their purpose is to make their web
applications look impressive. Thus, this paper would highlight on the
usability and accessibility of a voice recognition browser as a tool to
facilitate the visually impaired and blind learners in accessing virtual
learning environment. More specifically, the objectives of the study
are (i) to explore the challenges faced by the visually impaired
learners in accessing virtual learning environment (ii) to determine
the suitable guidelines for developing a voice recognition browser
that is accessible to the visually impaired. Furthermore, this study
was prepared based on an observation conducted with the Malaysian
visually impaired learners. Finally, the result of this study would
underline on the development of an accessible voice recognition
browser for the visually impaired.
Abstract: Dealing with hundreds of features in character
recognition systems is not unusual. This large number of features
leads to the increase of computational workload of recognition
process. There have been many methods which try to remove
unnecessary or redundant features and reduce feature dimensionality.
Besides because of the characteristics of Farsi scripts, it-s not
possible to apply other languages algorithms to Farsi directly. In this
paper some methods for feature subset selection using genetic
algorithms are applied on a Farsi optical character recognition (OCR)
system. Experimental results show that application of genetic
algorithms (GA) to feature subset selection in a Farsi OCR results in
lower computational complexity and enhanced recognition rate.
Abstract: In Thailand, the practice of pre-hospital Emergency
Medical Service (EMS) in each area reveals the different growth
rates and effectiveness of the practices. Those can be found as the
diverse quality and quantity. To shorten the learning curve prior to
speed-up the practices in other areas, story telling and lessons learnt
from the effective practices are valued as meaningful knowledge. To
this paper, it was to ascertain the factors, lessons learnt and best
practices that have impact as contributing to the success of prehospital
EMS system. Those were formulized as model prior to
speedup the practice in other areas. To develop the model, Malcolm
Baldrige National Quality Award (MBNQA), which is widely
recognized as a framework for organizational quality assessment and
improvement, was chosen as the discussion framework. Remarkably,
this study was based on the consideration of knowledge capture;
however it was not to complete the loop of knowledge activities.
Nevertheless, it was to highlight the recognition of knowledge
capture, which is the initiation of knowledge management.
Abstract: A comparison between the performance of Latin and
Arabic handwritten digits recognition problems is presented. The
performance of ten different classifiers is tested on two similar
Arabic and Latin handwritten digits databases. The analysis shows
that Arabic handwritten digits recognition problem is easier than that
of Latin digits. This is because the interclass difference in case of
Latin digits is smaller than in Arabic digits and variances in writing
Latin digits are larger. Consequently, weaker yet fast classifiers are
expected to play more prominent role in Arabic handwritten digits
recognition.
Abstract: Image data holds a large amount of different context
information. However, as of today, these resources remain largely
untouched. It is thus the aim of this paper to present a basic technical
framework which allows for a quick and easy exploitation of context
information from image data especially by non-expert users.
Furthermore, the proposed framework is discussed in detail
concerning important social and ethical issues which demand special
requirements in system design. Finally, a first sensor prototype is
presented which meets the identified requirements. Additionally,
necessary implications for the software and hardware design of the
system are discussed, rendering a sensor system which could be
regarded as a good, acceptable and justifiable technical and thereby
enabling the extraction of context information from image data.
Abstract: Support Vector Machine (SVM) is a recent class of statistical classification and regression techniques playing an increasing role in applications to detection problems in various engineering problems, notably in statistical signal processing, pattern recognition, image analysis, and communication systems. In this paper, SVM is applied to an infrared (IR) binary communication system with different types of channel models including Ricean multipath fading and partially developed scattering channel with additive white Gaussian noise (AWGN) at the receiver. The structure and performance of SVM in terms of the bit error rate (BER) metric is derived and simulated for these channel stochastic models and the computational complexity of the implementation, in terms of average computational time per bit, is also presented. The performance of SVM is then compared to classical binary signal maximum likelihood detection using a matched filter driven by On-Off keying (OOK) modulation. We found that the performance of SVM is superior to that of the traditional optimal detection schemes used in statistical communication, especially for very low signal-to-noise ratio (SNR) ranges. For large SNR, the performance of the SVM is similar to that of the classical detectors. The implication of these results is that SVM can prove very beneficial to IR communication systems that notoriously suffer from low SNR at the cost of increased computational complexity.
Abstract: SoftBoost is a recently presented boosting algorithm,
which trades off the size of achieved classification margin and
generalization performance. This paper presents a performance
evaluation of SoftBoost algorithm on the generic object recognition
problem. An appearance-based generic object recognition
model is used. The evaluation experiments are performed using
a difficult object recognition benchmark. An assessment with respect
to different degrees of label noise as well as a comparison to
the well known AdaBoost algorithm is performed. The obtained
results reveal that SoftBoost is encouraged to be used in cases
when the training data is known to have a high degree of noise.
Otherwise, using Adaboost can achieve better performance.
Abstract: The goal of this project is to design a system to
recognition voice commands. Most of voice recognition systems
contain two main modules as follow “feature extraction" and “feature
matching". In this project, MFCC algorithm is used to simulate
feature extraction module. Using this algorithm, the cepstral
coefficients are calculated on mel frequency scale. VQ (vector
quantization) method will be used for reduction of amount of data to
decrease computation time. In the feature matching stage Euclidean
distance is applied as similarity criterion. Because of high accuracy
of used algorithms, the accuracy of this voice command system is
high. Using these algorithms, by at least 5 times repetition for each
command, in a single training session, and then twice in each testing
session zero error rate in recognition of commands is achieved.
Abstract: To improve the classification rate of the face
recognition, features combination and a novel non-linear kernel are
proposed. The feature vector concatenates three different radius of
local binary patterns and Gabor wavelet features. Gabor features are
the mean, standard deviation and the skew of each scaling and
orientation parameter. The aim of the new kernel is to incorporate
the power of the kernel methods with the optimal balance between
the features. To verify the effectiveness of the proposed method,
numerous methods are tested by using four datasets, which are
consisting of various emotions, orientations, configuration,
expressions and lighting conditions. Empirical results show the
superiority of the proposed technique when compared to other
methods.
Abstract: In this paper a new approach to face recognition is
presented that achieves double dimension reduction, making the
system computationally efficient with better recognition results and
out perform common DCT technique of face recognition. In pattern
recognition techniques, discriminative information of image
increases with increase in resolution to a certain extent, consequently
face recognition results change with change in face image resolution
and provide optimal results when arriving at a certain resolution
level. In the proposed model of face recognition, initially image
decimation algorithm is applied on face image for dimension
reduction to a certain resolution level which provides best
recognition results. Due to increased computational speed and feature
extraction potential of Discrete Cosine Transform (DCT), it is
applied on face image. A subset of coefficients of DCT from low to
mid frequencies that represent the face adequately and provides best
recognition results is retained. A tradeoff between decimation factor,
number of DCT coefficients retained and recognition rate with
minimum computation is obtained. Preprocessing of the image is
carried out to increase its robustness against variations in poses and
illumination level. This new model has been tested on different
databases which include ORL , Yale and EME color database.