Abstract: Statement of the automatic speech recognition
problem, the assignment of speech recognition and the application
fields are shown in the paper. At the same time as Azerbaijan speech,
the establishment principles of speech recognition system and the
problems arising in the system are investigated. The computing algorithms of speech features, being the main part
of speech recognition system, are analyzed. From this point of view,
the determination algorithms of Mel Frequency Cepstral Coefficients
(MFCC) and Linear Predictive Coding (LPC) coefficients expressing
the basic speech features are developed. Combined use of cepstrals of
MFCC and LPC in speech recognition system is suggested to
improve the reliability of speech recognition system. To this end, the
recognition system is divided into MFCC and LPC-based recognition
subsystems. The training and recognition processes are realized in
both subsystems separately, and recognition system gets the decision
being the same results of each subsystems. This results in decrease of
error rate during recognition. The training and recognition processes are realized by artificial
neural networks in the automatic speech recognition system. The
neural networks are trained by the conjugate gradient method. In the
paper the problems observed by the number of speech features at
training the neural networks of MFCC and LPC-based speech
recognition subsystems are investigated. The variety of results of neural networks trained from different
initial points in training process is analyzed. Methodology of
combined use of neural networks trained from different initial points
in speech recognition system is suggested to improve the reliability
of recognition system and increase the recognition quality, and
obtained practical results are shown.
Abstract: This study investigates the cleaning performance of
high intensity 360 kHz frequency on removal of nano-dimensional
and sub-micron particles from various surfaces, uniformity of the
cleaning tank and run to run variation of cleaning process. The
uniformity of the cleaning tank was measured by two different
methods i.e. 1. ppbTM meter and 2. Liquid Particle Counting (LPC)
technique. The result indicates that the energy was distributed more
uniformly throughout the entire cleaning vessel even at the corners
and edges of the tank when megasonic sweeping technology is
applied. The result also shows that rinsing the parts with 360 kHz
frequency at final rinse gives lower particle counts, hence higher
cleaning efficiency as compared to other frequencies. When
megasonic sweeping technology is applied each piezoelectric
transducers will operate at their optimum resonant frequency and
generates stronger acoustic cavitational force and higher acoustic
streaming velocity. These combined forces are helping to enhance the
particle removal and at the same time improve the overall cleaning
performance. The multiple extractions study was also carried out for
various frequencies to measure the cleaning potential and asymptote
value.
Abstract: Analysis of vocal fold vibration is essential for understanding the mechanism of voice production and for improving clinical assessment of voice disorders. This paper presents a Dynamic Time Warping (DTW) based approach to analyze and objectively classify vocal fold vibration patterns. The proposed technique was designed and implemented on a Glottal Area Waveform (GAW) extracted from high-speed laryngeal images by delineating the glottal edges for each image frame. Feature extraction from the GAW was performed using Linear Predictive Coding (LPC). Several types of voice reference templates from simulations of clear, breathy, fry, pressed and hyperfunctional voice productions were used. The patterns of the reference templates were first verified using the analytical signal generated through Hilbert transformation of the GAW. Samples from normal speakers’ voice recordings were then used to evaluate and test the effectiveness of this approach. The classification of the voice patterns using the technique of LPC and DTW gave the accuracy of 81%.
Abstract: Aldehyde oxidase is molybdo-flavoenzyme involved in the oxidation of hundreds of endogenous and exogenous and N-heterocyclic compounds and environmental pollutants. Uncharged N-heterocyclic aromatic compounds such phenanthridine are commonly distributed pollutants in soil, air, sediments, surface water and groundwater, and in animal and plant tissues. Phenanthridine as uncharged N-heterocyclic aromatic compound was incubated with partially purified aldehyde oxidase from rainbow trout fish liver. Reversed-phase HLPC method was used to separate the oxidation products from phenanthridine and the metabolite was identified. The 6(5H)-phenanthridinone was identified the major metabolite by partially purified aldehyde oxidase from fish liver. Kinetic constant for the oxidation reactions were determined spectrophotometrically and showed that this substrate has a good affinity (Km = 78 ± 7.6µM) for hepatic aldehyde oxidase, will be a significant pathway. This study confirms that partially purified aldehyde oxidase from fish liver is indeed the enzyme responsible for the in vitro production 6(5H)-phenanthridinone metabolite as it is a major metabolite by mammalian aldehyde oxidase, coupled with a relatively high oxidation rate (0.77± 0.03 nmol/min/mg protein). In addition, the kinetic parameters of hepatic fish aldehyde oxidase towards the phenanthridine substrate indicate that in vitro biotransformation by hepatic fish aldehyde oxidase will be a significant pathway. This study confirms that partially purified aldehyde oxidase from fish liver is indeed the enzyme responsible for the in vitro production 6(5H)-phenanthridinone metabolite as it is a major metabolite by mammalian aldehyde oxidase.
Abstract: Paper presents an comparative evaluation of features extraction algorithm for a real-time isolated word recognition system
based on FPGA. The Mel-frequency cepstral, linear frequency cepstral, linear predictive and their cepstral coefficients were
implemented in hardware/software design. The proposed system was investigated in speaker dependent mode for 100 different
Lithuanian words. The robustness of features extraction algorithms was tested recognizing the speech records at different signal to noise rates. The experiments on clean records show highest accuracy for Mel-frequency cepstral and linear frequency cepstral coefficients. For records with 15 dB signal to noise rate the linear predictive cepstral coefficients gives best result. The hard and soft part of the system is clocked on 50 MHz and 100 MHz accordingly. For the classification purpose the pipelined dynamic time warping core was implemented. The proposed word recognition system satisfy the real-time requirements and is suitable for applications in embedded systems.
Abstract: Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.
Abstract: Bangla Vowel characterization determines the spectral properties of Bangla vowels for efficient synthesis as well as recognition of Bangla vowels. In this paper, Bangla vowels in isolated word have been analyzed based on speech production model within the framework of Analysis-by-Synthesis. This has led to the extraction of spectral parameters for the production model in order to produce different Bangla vowel sounds. The real and synthetic spectra are compared and a weighted square error has been computed along with the error in the formant bandwidths for efficient representation of Bangla vowels. The extracted features produced good representation of targeted Bangla vowel. Such a representation also plays essential role in low bit rate speech coding and vocoders.
Abstract: A set of Artificial Neural Network (ANN) based methods
for the design of an effective system of speech recognition of
numerals of Assamese language captured under varied recording
conditions and moods is presented here. The work is related to
the formulation of several ANN models configured to use Linear
Predictive Code (LPC), Principal Component Analysis (PCA) and
other features to tackle mood and gender variations uttering numbers
as part of an Automatic Speech Recognition (ASR) system in
Assamese. The ANN models are designed using a combination of
Self Organizing Map (SOM) and Multi Layer Perceptron (MLP)
constituting a Learning Vector Quantization (LVQ) block trained in a
cooperative environment to handle male and female speech samples
of numerals of Assamese- a language spoken by a sizable population
in the North-Eastern part of India. The work provides a comparative
evaluation of several such combinations while subjected to handle
speech samples with gender based differences captured by a microphone
in four different conditions viz. noiseless, noise mixed, stressed
and stress-free.
Abstract: This paper presents an ESN-based Arabic phoneme
recognition system trained with supervised, forced and combined
supervised/forced supervised learning algorithms. Mel-Frequency
Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC)
techniques are used and compared as the input feature extraction
technique. The system is evaluated using 6 speakers from the King
Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia
dialectic and 34 speakers from the Center for Spoken Language
Understanding (CSLU2002) database of speakers with different
dialectics from 12 Arabic countries. Results for the KAPD and
CSLU2002 Arabic databases show phoneme recognition
performances of 72.31% and 38.20% respectively.
Abstract: In this paper, a novel method for a biometric system based on the ECG signal is proposed, using spectral coefficients computed through linear predictive coding (LPC). ECG biometric systems have traditionally incorporated characteristics of fiducial points of the ECG signal as the feature set. These systems have been shown to contain loopholes and thus a non-fiducial system allows for tighter security. In the proposed system, incorporating non-fiducial features from the LPC spectrum produced a segment and subject recognition rate of 99.52% and 100% respectively. The recognition rates outperformed the biometric system that is based on the wavelet packet decomposition (WPD) algorithm in terms of recognition rates and computation time. This allows for LPC to be used in a practical ECG biometric system that requires fast, stringent and accurate recognition.
Abstract: Vector quantization is a powerful tool for speech
coding applications. This paper deals with LPC Coding of speech
signals which uses a new technique called Multi Switched Split
Vector Quantization, This is a hybrid of two product code vector
quantization techniques namely the Multi stage vector quantization
technique, and Switched split vector quantization technique,. Multi
Switched Split Vector Quantization technique quantizes the linear
predictive coefficients in terms of line spectral frequencies. From
results it is proved that Multi Switched Split Vector Quantization
provides better trade off between bitrate and spectral distortion
performance, computational complexity and memory requirements
when compared to Switched Split Vector Quantization, Multi stage
vector quantization, and Split Vector Quantization techniques. By
employing the switching technique at each stage of the vector
quantizer the spectral distortion, computational complexity and
memory requirements were greatly reduced. Spectral distortion was
measured in dB, Computational complexity was measured in
floating point operations (flops), and memory requirements was
measured in (floats).
Abstract: The standard investigational method for obstructive
sleep apnea syndrome (OSAS) diagnosis is polysomnography (PSG),
which consists of a simultaneous, usually overnight recording of
multiple electro-physiological signals related to sleep and
wakefulness. This is an expensive, encumbering and not a readily
repeated protocol, and therefore there is need for simpler and easily
implemented screening and detection techniques. Identification of
apnea/hypopnea events in the screening recordings is the key factor
for the diagnosis of OSAS. The analysis of a solely single-lead
electrocardiographic (ECG) signal for OSAS diagnosis, which may
be done with portable devices, at patient-s home, is the challenge of
the last years. A novel artificial neural network (ANN) based
approach for feature extraction and automatic identification of
respiratory events in ECG signals is presented in this paper. A
nonlinear principal component analysis (NLPCA) method was
considered for feature extraction and support vector machine for
classification/recognition. An alternative representation of the
respiratory events by means of Kohonen type neural network is
discussed. Our prospective study was based on OSAS patients of the
Clinical Hospital of Pneumology from Iaşi, Romania, males and
females, as well as on non-OSAS investigated human subjects. Our
computed analysis includes a learning phase based on cross signal
PSG annotation.
Abstract: Vector quantization is a powerful tool for speech
coding applications. This paper deals with LPC Coding of speech
signals which uses a new technique called Multi Switched Split
Vector Quantization (MSSVQ), which is a hybrid of Multi, switched,
split vector quantization techniques. The spectral distortion
performance, computational complexity, and memory requirements
of MSSVQ are compared to split vector quantization (SVQ), multi
stage vector quantization(MSVQ) and switched split vector
quantization (SSVQ) techniques. It has been proved from results that
MSSVQ has better spectral distortion performance, lower
computational complexity and lower memory requirements when
compared to all the above mentioned product code vector
quantization techniques. Computational complexity is measured in
floating point operations (flops), and memory requirements is
measured in (floats).
Abstract: This research work is aimed at speech recognition
using scaly neural networks. A small vocabulary of 11 words were
established first, these words are “word, file, open, print, exit, edit,
cut, copy, paste, doc1, doc2". These chosen words involved with
executing some computer functions such as opening a file, print
certain text document, cutting, copying, pasting, editing and exit.
It introduced to the computer then subjected to feature extraction
process using LPC (linear prediction coefficients). These features are
used as input to an artificial neural network in speaker dependent
mode. Half of the words are used for training the artificial neural
network and the other half are used for testing the system; those are
used for information retrieval.
The system components are consist of three parts, speech
processing and feature extraction, training and testing by using neural
networks and information retrieval.
The retrieve process proved to be 79.5-88% successful, which is
quite acceptable, considering the variation to surrounding, state of
the person, and the microphone type.
Abstract: The adaptive power control of Code Division Multiple
Access (CDMA) communications using Remote Radio Head
(RRH) between multiple Unmanned Aerial Vehicles (UAVs) with
a link-budget based Signal-to-Interference Ratio (SIR) estimate is
applied to four inner loop power control algorithms. It is concluded
that Base Station (BS) can calculate not only UAV distance using
linearity between speed and Consecutive Transmit-Power-Control
Ratio (CTR) of Adaptive Step-size Closed Loop Power Control (ASCLPC),
Consecutive TPC Ratio Step-size Closed Loop Power Control
(CS-CLPC), Fixed Step-size Power Control (FSPC), but also UAV
position with Received Signal Strength Indicator (RSSI) ratio of
RRHs.
Abstract: This paper presents a vocoder to obtain high quality synthetic speech at 600 bps. To reduce the bit rate, the algorithm is based on a sinusoidally excited linear prediction model which extracts few coding parameters, and three consecutive frames are grouped into a superframe and jointly vector quantization is used to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced frame combinations in the superframe. Experimental results show that the quality of the proposed coder is better than that of 2.4kbps LPC10e and achieves approximately the same as that of 2.4kbps MELP and with high robustness.
Abstract: This paper presents a formant-tracking linear prediction
(FTLP) model for speech processing in noise. The main focus of this
work is the detection of formant trajectory based on Hidden Markov
Models (HMM), for improved formant estimation in noise. The
approach proposed in this paper provides a systematic framework for
modelling and utilization of a time- sequence of peaks which satisfies
continuity constraints on parameter; the within peaks are modelled
by the LP parameters. The formant tracking LP model estimation
is composed of three stages: (1) a pre-cleaning multi-band spectral
subtraction stage to reduce the effect of residue noise on formants
(2) estimation stage where an initial estimate of the LP model of
speech for each frame is obtained (3) a formant classification using
probability models of formants and Viterbi-decoders. The evaluation
results for the estimation of the formant tracking LP model tested
in Gaussian white noise background, demonstrate that the proposed
combination of the initial noise reduction stage with formant tracking
and LPC variable order analysis, results in a significant reduction in
errors and distortions. The performance was evaluated with noisy
natual vowels extracted from international french and English vocabulary
speech signals at SNR value of 10dB. In each case, the
estimated formants are compared to reference formants.
Abstract: The speech signal conveys information about the
identity of the speaker. The area of speaker identification is
concerned with extracting the identity of the person speaking the
utterance. As speech interaction with computers becomes more
pervasive in activities such as the telephone, financial transactions
and information retrieval from speech databases, the utility of
automatically identifying a speaker is based solely on vocal
characteristic. This paper emphasizes on text dependent speaker
identification, which deals with detecting a particular speaker from a
known population. The system prompts the user to provide speech
utterance. System identifies the user by comparing the codebook of
speech utterance with those of the stored in the database and lists,
which contain the most likely speakers, could have given that speech
utterance. The speech signal is recorded for N speakers further the
features are extracted. Feature extraction is done by means of LPC
coefficients, calculating AMDF, and DFT. The neural network is
trained by applying these features as input parameters. The features
are stored in templates for further comparison. The features for the
speaker who has to be identified are extracted and compared with the
stored templates using Back Propogation Algorithm. Here, the
trained network corresponds to the output; the input is the extracted
features of the speaker to be identified. The network does the weight
adjustment and the best match is found to identify the speaker. The
number of epochs required to get the target decides the network
performance.
Abstract: Speech corpus is one of the major components in a
Speech Processing System where one of the primary requirements
is to recognize an input sample. The quality and details captured
in speech corpus directly affects the precision of recognition. The
current work proposes a platform for speech corpus generation using
an adaptive LMS filter and LPC cepstrum, as a part of an ANN
based Speech Recognition System which is exclusively designed to
recognize isolated numerals of Assamese language- a major language
in the North Eastern part of India. The work focuses on designing an
optimal feature extraction block and a few ANN based cooperative
architectures so that the performance of the Speech Recognition
System can be improved.