Scholarly

Neural Network Based Speech to Text in Malay Language

Year: 2019 Volume: 13 Issue: 11 599 - 602 Pages

Abstract: Speech to text in Malay language is a system that converts Malay speech into text. The Malay language recognition system is still limited, thus, this paper aims to investigate the performance of ten Malay words obtained from the online Malay news. The methodology consists of three stages, which are preprocessing, feature extraction, and speech classification. In preprocessing stage, the speech samples are filtered using pre emphasis. After that, feature extraction method is applied to the samples using Mel Frequency Cepstrum Coefficient (MFCC). Lastly, speech classification is performed using Feedforward Neural Network (FFNN). The accuracy of the classification is further investigated based on the hidden layer size. From experimentation, the classifier with 40 hidden neurons shows the highest classification rate which is 94%.

Road Vehicle Recognition Using Magnetic Sensing Feature Extraction and Classification

Year: 2018 Volume: 12 Issue: 4 270 - 275 Pages

Abstract: This paper presents a road vehicle detection approach for the intelligent transportation system. This approach mainly uses low-cost magnetic sensor and associated data collection system to collect magnetic signals. This system can measure the magnetic field changing, and it also can detect and count vehicles. We extend Mel Frequency Cepstral Coefficients to analyze vehicle magnetic signals. Vehicle type features are extracted using representation of cepstrum, frame energy, and gap cepstrum of magnetic signals. We design a 2-dimensional map algorithm using Vector Quantization to classify vehicle magnetic features to four typical types of vehicles in Australian suburbs: sedan, VAN, truck, and bus. Experiments results show that our approach achieves a high level of accuracy for vehicle detection and classification.

An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System

Year: 2014 Volume: 8 Issue: 10 1949 - 1958 Pages

Abstract: Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.

Signal Reconstruction Using Cepstrum of Higher Order Statistics

Year: 2007 Volume: 1 Issue: 6 290 - 295 Pages

Abstract: This paper presents an algorithm for reconstructing phase and magnitude responses of the impulse response when only the output data are available. The system is driven by a zero-mean independent identically distributed (i.i.d) non-Gaussian sequence that is not observed. The additive noise is assumed to be Gaussian. This is an important and essential problem in many practical applications of various science and engineering areas such as biomedical, seismic, and speech processing signals. The method is based on evaluating the bicepstrum of the third-order statistics of the observed output data. Simulations results are presented that demonstrate the performance of this method.

Echo State Networks for Arabic Phoneme Recognition

Year: 2013 Volume: 7 Issue: 7 1021 - 1027 Pages

Abstract: This paper presents an ESN-based Arabic phoneme recognition system trained with supervised, forced and combined supervised/forced supervised learning algorithms. Mel-Frequency Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC) techniques are used and compared as the input feature extraction technique. The system is evaluated using 6 speakers from the King Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia dialectic and 34 speakers from the Center for Spoken Language Understanding (CSLU2002) database of speakers with different dialectics from 12 Arabic countries. Results for the KAPD and CSLU2002 Arabic databases show phoneme recognition performances of 72.31% and 38.20% respectively.

Long-Term Simulation of Digestive Sound Signals by CEPSTRAL Technique

Year: 2007 Volume: 1 Issue: 2 91 - 95 Pages

Abstract: In this study, an investigation over digestive diseases has been done in which the sound acts as a detector medium. Pursue to the preprocessing the extracted signal in cepstrum domain is registered. After classification of digestive diseases, the system selects random samples based on their features and generates the interest nonstationary, long-term signals via inverse transform in cepstral domain which is presented in digital and sonic form as the output. This structure is updatable or on the other word, by receiving a new signal the corresponding disease classification is updated in the feature domain.

Vehicle Gearbox Fault Diagnosis Based On Cepstrum Analysis

Year: 2014 Volume: 8 Issue: 9 1530 - 1536 Pages

Abstract: Research on damage of gears and gear pairs using vibration signals remains very attractive, because vibration signals from a gear pair are complex in nature and not easy to interpret. Predicting gear pair defects by analyzing changes in vibration signal of gears pairs in operation is a very reliable method. Therefore, a suitable vibration signal processing technique is necessary to extract defect information generally obscured by the noise from dynamic factors of other gear pairs.This article presents the value of cepstrum analysis in vehicle gearbox fault diagnosis. Cepstrum represents the overall power content of a whole family of harmonics and sidebands when more than one family of sidebands is present at the same time. The concept for the measurement and analysis involved in using the technique are briefly outlined. Cepstrum analysis is used for detection of an artificial pitting defect in a vehicle gearbox loaded with different speeds and torques. The test stand is equipped with three dynamometers; the input dynamometer serves asthe internal combustion engine, the output dynamometers introduce the load on the flanges of the output joint shafts. The pitting defect is manufactured on the tooth side of a gear of the fifth speed on the secondary shaft. Also, a method for fault diagnosis of gear faults is presented based on order Cepstrum. The procedure is illustrated with the experimental vibration data of the vehicle gearbox. The results show the effectiveness of Cepstrum analysis in detection and diagnosis of the gear condition.

Trispectral Analysis of Voiced Sounds Defective Audition and Tracheotomisian Cases

Year: 2007 Volume: 1 Issue: 9 1308 - 1312 Pages

Authors:
H. Maalem
F. Marir

Abstract: This paper presents the cepstral and trispectral analysis of a speech signal produced by normal men, men with defective audition (deaf, deep deaf) and others affected by tracheotomy, the trispectral analysis based on parametric methods (Autoregressive AR) using the fourth order cumulant. These analyses are used to detect and compare the pitches and the formants of corresponding voiced sounds (vowel \a\, \i\ and \u\). The first results appear promising, since- it seems after several experimentsthere is no deformation of the spectrum as one could have supposed it at the beginning, however these pathologies influenced the two characteristics: The defective audition influences to the formants contrary to the tracheotomy, which influences the fundamental frequency (pitch).

Using Teager Energy Cepstrum and HMM distancesin Automatic Speech Recognition and Analysis of Unvoiced Speech

Year: 2009 Volume: 3 Issue: 11 1935 - 1941 Pages

Authors:
Panikos Heracleous

Abstract: In this study, the use of silicon NAM (Non-Audible Murmur) microphone in automatic speech recognition is presented. NAM microphones are special acoustic sensors, which are attached behind the talker-s ear and can capture not only normal (audible) speech, but also very quietly uttered speech (non-audible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech conversion etc.) for sound-impaired people. Using a small amount of training data and adaptation approaches, 93.9% word accuracy was achieved for a 20k Japanese vocabulary dictation task. Non-audible murmur recognition in noisy environments is also investigated. In this study, further analysis of the NAM speech has been made using distance measures between hidden Markov model (HMM) pairs. It has been shown the reduced spectral space of NAM speech using a metric distance, however the location of the different phonemes of NAM are similar to the location of the phonemes of normal speech, and the NAM sounds are well discriminated. Promising results in using nonlinear features are also introduced, especially under noisy conditions.

Assamese Numeral Corpus for Speech Recognition using Cooperative ANN Architecture

Year: 2009 Volume: 3 Issue: 4 858 - 867 Pages

Abstract: Speech corpus is one of the major components in a Speech Processing System where one of the primary requirements is to recognize an input sample. The quality and details captured in speech corpus directly affects the precision of recognition. The current work proposes a platform for speech corpus generation using an adaptive LMS filter and LPC cepstrum, as a part of an ANN based Speech Recognition System which is exclusively designed to recognize isolated numerals of Assamese language- a major language in the North Eastern part of India. The work focuses on designing an optimal feature extraction block and a few ANN based cooperative architectures so that the performance of the Speech Recognition System can be improved.

Keywords:
Filter
Feature
LMS
LPC
Cepstrum
ANN.

Top Journal

SUGGEST A JOURNAL