Speaker Recognition Using LIRA Neural Networks

This article contains information from our investigation in the field of voice recognition. For this purpose, we created a voice database that contains different phrases in two languages, English and Spanish, for men and women. As a classifier, the LIRA (Limited Receptive Area) grayscale neural classifier was selected. The LIRA grayscale neural classifier was developed for image recognition tasks and demonstrated good results. Therefore, we decided to develop a recognition system using this classifier for voice recognition. From a specific set of speakers, we can recognize the speaker’s voice. For this purpose, the system uses spectrograms of the voice signals as input to the system, extracts the characteristics and identifies the speaker. The results are described and analyzed in this article. The classifier can be used for speaker identification in security system or smart buildings for different types of intelligent devices.

Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments

Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.

A Study of Visual Attention in Diagnosing Cerebellar Tumours

Visual attention allows user to select the most relevant information to ongoing behaviour. This paper presents a study on; i) the performance of people measurements, ii) accurateness of people measurement of the peaks that correspond to chemical quantities from the Magnetic Resonance Spectroscopy (MRS) graphs and iii) affects of people measurements to the algorithm-based diagnosis. Participant-s eye-movement was recorded using eye-tracker tool (Eyelink II). This experiment involves three participants for examining 20 MRS graphs to estimate the peaks of chemical quantities which indicate the abnormalities associated with Cerebellar Tumours (CT). The status of each MRS is verified by using decision algorithm. Analysis involves determination of humans-s eye movement pattern in measuring the peak of spectrograms, scan path and determining the relationship of distributions of fixation durations with the accuracy of measurement. In particular, the eye-tracking data revealed which aspects of the spectrogram received more visual attention and in what order they were viewed. This preliminary investigation provides a proof of concept for use of the eye tracking technology as the basis for expanded CT diagnosis.

SySRA: A System of a Continuous Speech Recognition in Arab Language

We report in this paper the model adopted by our system of continuous speech recognition in Arab language SySRA and the results obtained until now. This system uses the database Arabdic-10 which is a corpus of word for the Arab language and which was manually segmented. Phonetic decoding is represented by an expert system where the knowledge base is translated in the form of production rules. This expert system transforms a vocal signal into a phonetic lattice. The higher level of the system takes care of the recognition of the lattice thus obtained by deferring it in the form of written sentences (orthographical Form). This level contains initially the lexical analyzer which is not other than the module of recognition. We subjected this analyzer to a set of spectrograms obtained by dictating a score of sentences in Arab language. The rate of recognition of these sentences is about 70% which is, to our knowledge, the best result for the recognition of the Arab language. The test set consists of twenty sentences from four speakers not having taken part in the training.

Multiclass Support Vector Machines for Environmental Sounds Classification Using log-Gabor Filters

In this paper we propose a robust environmental sound classification approach, based on spectrograms features driven from log-Gabor filters. This approach includes two methods. In the first methods, the spectrograms are passed through an appropriate log-Gabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criteria. The second method uses the same steps but applied only to three patches extracted from each spectrogram. To investigate the accuracy of the proposed methods, we conduct experiments using a large database containing 10 environmental sound classes. The classification results based on Multiclass Support Vector Machines show that the second method is the most efficient with an average classification accuracy of 89.62 %.

Study on Performance of Wigner Ville Distribution for Linear FM and Transient Signal Analysis

This research paper presents some methods to assess the performance of Wigner Ville Distribution for Time-Frequency representation of non-stationary signals, in comparison with the other representations like STFT, Spectrogram etc. The simultaneous timefrequency resolution of WVD is one of the important properties which makes it preferable for analysis and detection of linear FM and transient signals. There are two algorithms proposed here to assess the resolution and to compare the performance of signal detection. First method is based on the measurement of area under timefrequency plot; in case of a linear FM signal analysis. A second method is based on the instantaneous power calculation and is used in case of transient, non-stationary signals. The implementation is explained briefly for both methods with suitable diagrams. The accuracy of the measurements is validated to show the better performance of WVD representation in comparison with STFT and Spectrograms.