Abstract: This paper presents a technical speaker adaptation
method called WMLLR, which is based on maximum likelihood linear
regression (MLLR). In MLLR, a linear regression-based transform
which adapted the HMM mean vectors was calculated to maximize the
likelihood of adaptation data. In this paper, the prior knowledge of the
initial model is adequately incorporated into the adaptation. A series of
speaker adaptation experiments are carried out at a 30 famous city
names database to investigate the efficiency of the proposed method.
Experimental results show that the WMLLR method outperforms the
conventional MLLR method, especially when only few utterances
from a new speaker are available for adaptation.
Abstract: Recently, a quality of motors is inspected by human
ears. In this paper, I propose two systems using a method of speech
recognition for automation of the inspection. The first system is based
on a method of linear processing which uses K-means and Nearest
Neighbor method, and the second is based on a method of non-linear
processing which uses neural networks. I used motor sounds in these
systems, and I successfully recognize 86.67% of motor sounds in the
linear processing system and 97.78% in the non-linear processing
system.
Abstract: Parsing is important in Linguistics and Natural
Language Processing to understand the syntax and semantics of a
natural language grammar. Parsing natural language text is
challenging because of the problems like ambiguity and inefficiency.
Also the interpretation of natural language text depends on context
based techniques. A probabilistic component is essential to resolve
ambiguity in both syntax and semantics thereby increasing accuracy
and efficiency of the parser. Tamil language has some inherent
features which are more challenging. In order to obtain the solutions,
lexicalized and statistical approach is to be applied in the parsing
with the aid of a language model. Statistical models mainly focus on
semantics of the language which are suitable for large vocabulary
tasks where as structural methods focus on syntax which models
small vocabulary tasks. A statistical language model based on Trigram
for Tamil language with medium vocabulary of 5000 words has
been built. Though statistical parsing gives better performance
through tri-gram probabilities and large vocabulary size, it has some
disadvantages like focus on semantics rather than syntax, lack of
support in free ordering of words and long term relationship. To
overcome the disadvantages a structural component is to be
incorporated in statistical language models which leads to the
implementation of hybrid language models. This paper has attempted
to build phrase structured hybrid language model which resolves
above mentioned disadvantages. In the development of hybrid
language model, new part of speech tag set for Tamil language has
been developed with more than 500 tags which have the wider
coverage. A phrase structured Treebank has been developed with 326
Tamil sentences which covers more than 5000 words. A hybrid
language model has been trained with the phrase structured Treebank
using immediate head parsing technique. Lexicalized and statistical
parser which employs this hybrid language model and immediate
head parsing technique gives better results than pure grammar and
trigram based model.
Abstract: Speech corpus is one of the major components in a
Speech Processing System where one of the primary requirements
is to recognize an input sample. The quality and details captured
in speech corpus directly affects the precision of recognition. The
current work proposes a platform for speech corpus generation using
an adaptive LMS filter and LPC cepstrum, as a part of an ANN
based Speech Recognition System which is exclusively designed to
recognize isolated numerals of Assamese language- a major language
in the North Eastern part of India. The work focuses on designing an
optimal feature extraction block and a few ANN based cooperative
architectures so that the performance of the Speech Recognition
System can be improved.
Abstract: In this paper, we study a class of serially concatenated block codes (SCBC) based on matrix interleavers, to be employed in fixed wireless communication systems. The performances of SCBC¬coded systems are investigated under various interleaver dimensions. Numerical results reveal that the matrix interleaver could be a competitive candidate over conventional block interleaver for frame lengths of 200 bits; hence, the SCBC coding based on matrix interleaver is a promising technique to be employed for speech transmission applications in many international standards such as pan-European Global System for Mobile communications (GSM), Digital Cellular Systems (DCS) 1800, and Joint Detection Code Division Multiple Access (JD-CDMA) mobile radio systems, where the speech frame contains around 200 bits.
Abstract: This paper presents the source extraction system which can extract only target signals with constraints on source localization in on-line systems. The proposed system is a kind of methods for enhancing a target signal and suppressing other interference signals. But, the performance of proposed system is superior to any other methods and the extraction of target source is comparatively complete. The method has a beamforming concept and uses an improved time-frequency (TF) mask-based BSS algorithm to separate a target signal from multiple noise sources. The target sources are assumed to be in front and test data was recorded in a reverberant room. The experimental results of the proposed method was evaluated by the PESQ score of real-recording sentences and showed a noticeable speech enhancement.
Abstract: In this paper we present an efficient system for
independent speaker speech recognition based on neural network
approach. The proposed architecture comprises two phases: a
preprocessing phase which consists in segmental normalization and
features extraction and a classification phase which uses neural
networks based on nonparametric density estimation namely the
general regression neural network (GRNN). The relative
performances of the proposed model are compared to the similar
recognition systems based on the Multilayer Perceptron (MLP), the
Recurrent Neural Network (RNN) and the well known Discrete
Hidden Markov Model (HMM-VQ) that we have achieved also.
Experimental results obtained with Arabic digits have shown that the
use of nonparametric density estimation with an appropriate
smoothing factor (spread) improves the generalization power of the
neural network. The word error rate (WER) is reduced significantly
over the baseline HMM method. GRNN computation is a successful
alternative to the other neural network and DHMM.