Abstract: Statement of the automatic speech recognition
problem, the assignment of speech recognition and the application
fields are shown in the paper. At the same time as Azerbaijan speech,
the establishment principles of speech recognition system and the
problems arising in the system are investigated. The computing algorithms of speech features, being the main part
of speech recognition system, are analyzed. From this point of view,
the determination algorithms of Mel Frequency Cepstral Coefficients
(MFCC) and Linear Predictive Coding (LPC) coefficients expressing
the basic speech features are developed. Combined use of cepstrals of
MFCC and LPC in speech recognition system is suggested to
improve the reliability of speech recognition system. To this end, the
recognition system is divided into MFCC and LPC-based recognition
subsystems. The training and recognition processes are realized in
both subsystems separately, and recognition system gets the decision
being the same results of each subsystems. This results in decrease of
error rate during recognition. The training and recognition processes are realized by artificial
neural networks in the automatic speech recognition system. The
neural networks are trained by the conjugate gradient method. In the
paper the problems observed by the number of speech features at
training the neural networks of MFCC and LPC-based speech
recognition subsystems are investigated. The variety of results of neural networks trained from different
initial points in training process is analyzed. Methodology of
combined use of neural networks trained from different initial points
in speech recognition system is suggested to improve the reliability
of recognition system and increase the recognition quality, and
obtained practical results are shown.
Abstract: Analysis of vocal fold vibration is essential for understanding the mechanism of voice production and for improving clinical assessment of voice disorders. This paper presents a Dynamic Time Warping (DTW) based approach to analyze and objectively classify vocal fold vibration patterns. The proposed technique was designed and implemented on a Glottal Area Waveform (GAW) extracted from high-speed laryngeal images by delineating the glottal edges for each image frame. Feature extraction from the GAW was performed using Linear Predictive Coding (LPC). Several types of voice reference templates from simulations of clear, breathy, fry, pressed and hyperfunctional voice productions were used. The patterns of the reference templates were first verified using the analytical signal generated through Hilbert transformation of the GAW. Samples from normal speakers’ voice recordings were then used to evaluate and test the effectiveness of this approach. The classification of the voice patterns using the technique of LPC and DTW gave the accuracy of 81%.
Abstract: In this paper, a novel method for a biometric system based on the ECG signal is proposed, using spectral coefficients computed through linear predictive coding (LPC). ECG biometric systems have traditionally incorporated characteristics of fiducial points of the ECG signal as the feature set. These systems have been shown to contain loopholes and thus a non-fiducial system allows for tighter security. In the proposed system, incorporating non-fiducial features from the LPC spectrum produced a segment and subject recognition rate of 99.52% and 100% respectively. The recognition rates outperformed the biometric system that is based on the wavelet packet decomposition (WPD) algorithm in terms of recognition rates and computation time. This allows for LPC to be used in a practical ECG biometric system that requires fast, stringent and accurate recognition.
Abstract: Vector quantization is a powerful tool for speech
coding applications. This paper deals with LPC Coding of speech
signals which uses a new technique called Multi Switched Split
Vector Quantization (MSSVQ), which is a hybrid of Multi, switched,
split vector quantization techniques. The spectral distortion
performance, computational complexity, and memory requirements
of MSSVQ are compared to split vector quantization (SVQ), multi
stage vector quantization(MSVQ) and switched split vector
quantization (SSVQ) techniques. It has been proved from results that
MSSVQ has better spectral distortion performance, lower
computational complexity and lower memory requirements when
compared to all the above mentioned product code vector
quantization techniques. Computational complexity is measured in
floating point operations (flops), and memory requirements is
measured in (floats).
Abstract: The speech signal conveys information about the
identity of the speaker. The area of speaker identification is
concerned with extracting the identity of the person speaking the
utterance. As speech interaction with computers becomes more
pervasive in activities such as the telephone, financial transactions
and information retrieval from speech databases, the utility of
automatically identifying a speaker is based solely on vocal
characteristic. This paper emphasizes on text dependent speaker
identification, which deals with detecting a particular speaker from a
known population. The system prompts the user to provide speech
utterance. System identifies the user by comparing the codebook of
speech utterance with those of the stored in the database and lists,
which contain the most likely speakers, could have given that speech
utterance. The speech signal is recorded for N speakers further the
features are extracted. Feature extraction is done by means of LPC
coefficients, calculating AMDF, and DFT. The neural network is
trained by applying these features as input parameters. The features
are stored in templates for further comparison. The features for the
speaker who has to be identified are extracted and compared with the
stored templates using Back Propogation Algorithm. Here, the
trained network corresponds to the output; the input is the extracted
features of the speaker to be identified. The network does the weight
adjustment and the best match is found to identify the speaker. The
number of epochs required to get the target decides the network
performance.
Abstract: This paper investigates the performance of a speech
recognizer in an interactive voice response system for various coded
speech signals, coded by using a vector quantization technique namely
Multi Switched Split Vector Quantization Technique. The process of
recognizing the coded output can be used in Voice banking application.
The recognition technique used for the recognition of the coded speech
signals is the Hidden Markov Model technique. The spectral distortion
performance, computational complexity, and memory requirements of
Multi Switched Split Vector Quantization Technique and the
performance of the speech recognizer at various bit rates have been
computed. From results it is found that the speech recognizer is
showing better performance at 24 bits/frame and it is found that the
percentage of recognition is being varied from 100% to 93.33% for
various bit rates.