Abstract: Speech to text in Malay language is a system that converts Malay speech into text. The Malay language recognition system is still limited, thus, this paper aims to investigate the performance of ten Malay words obtained from the online Malay news. The methodology consists of three stages, which are preprocessing, feature extraction, and speech classification. In preprocessing stage, the speech samples are filtered using pre emphasis. After that, feature extraction method is applied to the samples using Mel Frequency Cepstrum Coefficient (MFCC). Lastly, speech classification is performed using Feedforward Neural Network (FFNN). The accuracy of the classification is further investigated based on the hidden layer size. From experimentation, the classifier with 40 hidden neurons shows the highest classification rate which is 94%.
Abstract: This paper presents a road vehicle detection approach for the intelligent transportation system. This approach mainly uses low-cost magnetic sensor and associated data collection system to collect magnetic signals. This system can measure the magnetic field changing, and it also can detect and count vehicles. We extend Mel Frequency Cepstral Coefficients to analyze vehicle magnetic signals. Vehicle type features are extracted using representation of cepstrum, frame energy, and gap cepstrum of magnetic signals. We design a 2-dimensional map algorithm using Vector Quantization to classify vehicle magnetic features to four typical types of vehicles in Australian suburbs: sedan, VAN, truck, and bus. Experiments results show that our approach achieves a high level of accuracy for vehicle detection and classification.
Abstract: Speaker Identification (SI) is the task of establishing
identity of an individual based on his/her voice characteristics. The SI
task is typically achieved by two-stage signal processing: training and
testing. The training process calculates speaker specific feature
parameters from the speech and generates speaker models
accordingly. In the testing phase, speech samples from unknown
speakers are compared with the models and classified. Even though
performance of speaker identification systems has improved due to
recent advances in speech processing techniques, there is still need of
improvement. In this paper, a Closed-Set Tex-Independent Speaker
Identification System (CISI) based on a Multiple Classifier System
(MCS) is proposed, using Mel Frequency Cepstrum Coefficient
(MFCC) as feature extraction and suitable combination of vector
quantization (VQ) and Gaussian Mixture Model (GMM) together
with Expectation Maximization algorithm (EM) for speaker
modeling. The use of Voice Activity Detector (VAD) with a hybrid
approach based on Short Time Energy (STE) and Statistical
Modeling of Background Noise in the pre-processing step of the
feature extraction yields a better and more robust automatic speaker
identification system. Also investigation of Linde-Buzo-Gray (LBG)
clustering algorithm for initialization of GMM, for estimating the
underlying parameters, in the EM step improved the convergence rate
and systems performance. It also uses relative index as confidence
measures in case of contradiction in identification process by GMM
and VQ as well. Simulation results carried out on voxforge.org
speech database using MATLAB highlight the efficacy of the
proposed method compared to earlier work.
Abstract: This paper presents an algorithm for reconstructing phase and magnitude responses of the impulse response when only the output data are available. The system is driven by a zero-mean independent identically distributed (i.i.d) non-Gaussian sequence that is not observed. The additive noise is assumed to be Gaussian. This is an important and essential problem in many practical applications of various science and engineering areas such as biomedical, seismic, and speech processing signals. The method is based on evaluating the bicepstrum of the third-order statistics of the observed output data. Simulations results are presented that demonstrate the performance of this method.
Abstract: This paper presents an ESN-based Arabic phoneme
recognition system trained with supervised, forced and combined
supervised/forced supervised learning algorithms. Mel-Frequency
Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC)
techniques are used and compared as the input feature extraction
technique. The system is evaluated using 6 speakers from the King
Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia
dialectic and 34 speakers from the Center for Spoken Language
Understanding (CSLU2002) database of speakers with different
dialectics from 12 Arabic countries. Results for the KAPD and
CSLU2002 Arabic databases show phoneme recognition
performances of 72.31% and 38.20% respectively.
Abstract: In this study, an investigation over digestive diseases has been done in which the sound acts as a detector medium. Pursue to the preprocessing the extracted signal in cepstrum domain is registered. After classification of digestive diseases, the system selects random samples based on their features and generates the interest nonstationary, long-term signals via inverse transform in cepstral domain which is presented in digital and sonic form as the output. This structure is updatable or on the other word, by receiving a new signal the corresponding disease classification is updated in the feature domain.
Abstract: Research on damage of gears and gear pairs using
vibration signals remains very attractive, because vibration signals
from a gear pair are complex in nature and not easy to interpret.
Predicting gear pair defects by analyzing changes in vibration signal
of gears pairs in operation is a very reliable method. Therefore, a
suitable vibration signal processing technique is necessary to extract
defect information generally obscured by the noise from dynamic
factors of other gear pairs.This article presents the value of cepstrum
analysis in vehicle gearbox fault diagnosis. Cepstrum represents the
overall power content of a whole family of harmonics and sidebands
when more than one family of sidebands is present at the same time.
The concept for the measurement and analysis involved in using the
technique are briefly outlined. Cepstrum analysis is used for detection
of an artificial pitting defect in a vehicle gearbox loaded with
different speeds and torques. The test stand is equipped with three
dynamometers; the input dynamometer serves asthe internal
combustion engine, the output dynamometers introduce the load on
the flanges of the output joint shafts. The pitting defect is
manufactured on the tooth side of a gear of the fifth speed on the
secondary shaft. Also, a method for fault diagnosis of gear faults is
presented based on order Cepstrum. The procedure is illustrated with
the experimental vibration data of the vehicle gearbox. The results
show the effectiveness of Cepstrum analysis in detection and
diagnosis of the gear condition.
Abstract: This paper presents the cepstral and trispectral
analysis of a speech signal produced by normal men, men with
defective audition (deaf, deep deaf) and others affected by
tracheotomy, the trispectral analysis based on parametric methods
(Autoregressive AR) using the fourth order cumulant. These
analyses are used to detect and compare the pitches and the formants
of corresponding voiced sounds (vowel \a\, \i\ and \u\). The first
results appear promising, since- it seems after several experimentsthere
is no deformation of the spectrum as one could have supposed
it at the beginning, however these pathologies influenced the two
characteristics:
The defective audition influences to the formants contrary to the
tracheotomy, which influences the fundamental frequency (pitch).
Abstract: In this study, the use of silicon NAM (Non-Audible
Murmur) microphone in automatic speech recognition is presented.
NAM microphones are special acoustic sensors, which are attached
behind the talker-s ear and can capture not only normal (audible)
speech, but also very quietly uttered speech (non-audible murmur).
As a result, NAM microphones can be applied in automatic speech
recognition systems when privacy is desired in human-machine communication.
Moreover, NAM microphones show robustness against
noise and they might be used in special systems (speech recognition,
speech conversion etc.) for sound-impaired people. Using a small
amount of training data and adaptation approaches, 93.9% word
accuracy was achieved for a 20k Japanese vocabulary dictation
task. Non-audible murmur recognition in noisy environments is also
investigated. In this study, further analysis of the NAM speech has
been made using distance measures between hidden Markov model
(HMM) pairs. It has been shown the reduced spectral space of NAM
speech using a metric distance, however the location of the different
phonemes of NAM are similar to the location of the phonemes
of normal speech, and the NAM sounds are well discriminated.
Promising results in using nonlinear features are also introduced,
especially under noisy conditions.
Abstract: Speech corpus is one of the major components in a
Speech Processing System where one of the primary requirements
is to recognize an input sample. The quality and details captured
in speech corpus directly affects the precision of recognition. The
current work proposes a platform for speech corpus generation using
an adaptive LMS filter and LPC cepstrum, as a part of an ANN
based Speech Recognition System which is exclusively designed to
recognize isolated numerals of Assamese language- a major language
in the North Eastern part of India. The work focuses on designing an
optimal feature extraction block and a few ANN based cooperative
architectures so that the performance of the Speech Recognition
System can be improved.