Improved Text-Independent Speaker Identification using Fused MFCC and IMFCC Feature Sets based on Gaussian Filter

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for speech related applications. On a recent contribution by authors, it has been shown that the Inverted Mel- Frequency Cepstral Coefficients (IMFCC) is useful feature set for SI, which contains complementary information present in high frequency region. This paper introduces the Gaussian shaped filter (GF) while calculating MFCC and IMFCC in place of typical triangular shaped bins. The objective is to introduce a higher amount of correlation between subband outputs. The performances of both MFCC & IMFCC improve with GF over conventional triangular filter (TF) based implementation, individually as well as in combination. With GMM as speaker modeling paradigm, the performances of proposed GF based MFCC and IMFCC in individual and fused mode have been verified in two standard databases YOHO, (Microphone Speech) and POLYCOST (Telephone Speech) each of which has more than 130 speakers.

Practical Method for Digital Music Matching Robust to Various Sound Qualities

In this paper, we propose a practical digital music matching system that is robust to variation in sound qualities. The proposed system is subdivided into two parts: client and server. The client part consists of the input, preprocessing and feature extraction modules. The preprocessing module, including the music onset module, revises the value gap occurring on the time axis between identical songs of different formats. The proposed method uses delta-grouped Mel frequency cepstral coefficients (MFCCs) to extract music features that are robust to changes in sound quality. According to the number of sound quality formats (SQFs) used, a music server is constructed with a feature database (FD) that contains different sub feature databases (SFDs). When the proposed system receives a music file, the selection module selects an appropriate SFD from a feature database; the selected SFD is subsequently used by the matching module. In this study, we used 3,000 queries for matching experiments in three cases with different FDs. In each case, we used 1,000 queries constructed by mixing 8 SQFs and 125 songs. The success rate of music matching improved from 88.6% when using single a single SFD to 93.2% when using quadruple SFDs. By this experiment, we proved that the proposed method is robust to various sound qualities.

Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition

An emotional speech recognition system for the applications on smart phones was proposed in this study to combine with 3G mobile communications and social networks to provide users and their groups with more interaction and care. This study developed a mechanism using the support vector machines (SVM) to recognize the emotions of speech such as happiness, anger, sadness and normal. The mechanism uses a hierarchical classifier to adjust the weights of acoustic features and divides various parameters into the categories of energy and frequency for training. In this study, 28 commonly used acoustic features including pitch and volume were proposed for training. In addition, a time-frequency parameter obtained by continuous wavelet transforms was also used to identify the accent and intonation in a sentence during the recognition process. The Berlin Database of Emotional Speech was used by dividing the speech into male and female data sets for training. According to the experimental results, the accuracies of male and female test sets were increased by 4.6% and 5.2% respectively after using the time-frequency parameter for classifying happy and angry emotions. For the classification of all emotions, the average accuracy, including male and female data, was 63.5% for the test set and 90.9% for the whole data set.

Recognition of Isolated Handwritten Latin Characters using One Continuous Route of Freeman Chain Code Representation and Feedforward Neural Network Classifier

In a handwriting recognition problem, characters can be represented using chain codes. The main problem in representing characters using chain code is optimizing the length of the chain code. This paper proposes to use randomized algorithm to minimize the length of Freeman Chain Codes (FCC) generated from isolated handwritten characters. Feedforward neural network is used in the classification stage to recognize the image characters. Our test results show that by applying the proposed model, we reached a relatively high accuracy for the problem of isolated handwritten when tested on NIST database.

Comparison of MFCC and Cepstral Coefficients as a Feature Set for PCG Biometric Systems

Heart sound is an acoustic signal and many techniques used nowadays for human recognition tasks borrow speech recognition techniques. One popular choice for feature extraction of accoustic signals is the Mel Frequency Cepstral Coefficients (MFCC) which maps the signal onto a non-linear Mel-Scale that mimics the human hearing. However the Mel-Scale is almost linear in the frequency region of heart sounds and thus should produce similar results with the standard cepstral coefficients (CC). In this paper, MFCC is investigated to see if it produces superior results for PCG based human identification system compared to CC. Results show that the MFCC system is still superior to CC despite linear filter-banks in the lower frequency range, giving up to 95% correct recognition rate for MFCC and 90% for CC. Further experiments show that the high recognition rate is due to the implementation of filter-banks and not from Mel-Scaling.

Computer Aided X-Ray Diffraction Intensity Analysis for Spinels: Hands-On Computing Experience

The mineral having chemical compositional formula MgAl2O4 is called “spinel". The ferrites crystallize in spinel structure are known as spinel-ferrites or ferro-spinels. The spinel structure has a fcc cage of oxygen ions and the metallic cations are distributed among tetrahedral (A) and octahedral (B) interstitial voids (sites). The X-ray diffraction (XRD) intensity of each Bragg plane is sensitive to the distribution of cations in the interstitial voids of the spinel lattice. This leads to the method of determination of distribution of cations in the spinel oxides through XRD intensity analysis. The computer program for XRD intensity analysis has been developed in C language and also tested for the real experimental situation by synthesizing the spinel ferrite materials Mg0.6Zn0.4AlxFe2- xO4 and characterized them by X-ray diffractometry. The compositions of Mg0.6Zn0.4AlxFe2-xO4(x = 0.0 to 0.6) ferrites have been prepared by ceramic method and powder X-ray diffraction patterns were recorded. Thus, the authenticity of the program is checked by comparing the theoretically calculated data using computer simulation with the experimental ones. Further, the deduced cation distributions were used to fit the magnetization data using Localized canting of spins approach to explain the “recovery" of collinear spin structure due to Al3+ - substitution in Mg-Zn ferrites which is the case if A-site magnetic dilution and non-collinear spin structure. Since the distribution of cations in the spinel ferrites plays a very important role with regard to their electrical and magnetic properties, it is essential to determine the cation distribution in spinel lattice.

New Triangle-Ring UWB Bandpass Filter with Sharp Roll-Off and Dual Notched Bands

This paper presents a new ultra-wideband (UWB) bandpass filter (BPF) with sharp roll-off and dual-notched bands. The filter consists of a triangle ring multi-mode resonator (MMR) with the stub-loaded resonator (SLR) for controlling the two transmission zeros at 2.8 / 11 GHz, the embedded open-circuited stub and the asymmetric tight coupled input/output (I/O) lines for introducing the dual notched bands at 5.2 / 6.8 GHz. The attenuation slope in the lower and higher passband edges of the proposed filter show 160- and 153-dB/GHz, respectively. This study mainly provides a simple method to design a UWB bandpass filter with high passband selectivity and dual-notched bands for satisfying the Federal Communications Commission (FCC-defined) indoor UWB specification

Comparison of Parameterization Methods in Recognizing Spoken Arabic Digits

This paper proposes evaluation of sound parameterization methods in recognizing some spoken Arabic words, namely digits from zero to nine. Each isolated spoken word is represented by a single template based on a specific recognition feature, and the recognition is based on the Euclidean distance from those templates. The performance analysis of recognition is based on four parameterization features: the Burg Spectrum Analysis, the Walsh Spectrum Analysis, the Thomson Multitaper Spectrum Analysis and the Mel Frequency Cepstral Coefficients (MFCC) features. The main aim of this paper was to compare, analyze, and discuss the outcomes of spoken Arabic digits recognition systems based on the selected recognition features. The results acqired confirm that the use of MFCC features is a very promising method in recognizing Spoken Arabic digits.

Voice Command Recognition System Based on MFCC and VQ Algorithms

The goal of this project is to design a system to recognition voice commands. Most of voice recognition systems contain two main modules as follow “feature extraction" and “feature matching". In this project, MFCC algorithm is used to simulate feature extraction module. Using this algorithm, the cepstral coefficients are calculated on mel frequency scale. VQ (vector quantization) method will be used for reduction of amount of data to decrease computation time. In the feature matching stage Euclidean distance is applied as similarity criterion. Because of high accuracy of used algorithms, the accuracy of this voice command system is high. Using these algorithms, by at least 5 times repetition for each command, in a single training session, and then twice in each testing session zero error rate in recognition of commands is achieved.

Synthesis and Thermoelectric Behavior in Nanoparticles of Doped Co Ferrites

Samples of CoFe2-xCrxO4 where x varies from 0.0 to 0.5 were prepared by co-precipitation route. These samples were sintered at 750°C for 2 hours. These particles were characterized by X-ray diffraction (XRD) at room temperature. The FCC spinel structure was confirmed by XRD patterns of the samples. The crystallite sizes of these particles were calculated from the most intense peak by Scherrer formula. The crystallite sizes lie in the range of 37-60 nm. The lattice parameter was found decreasing upon substitution of Cr. DC electrical resistivity was measured as a function of temperature. The room temperature thermoelectric power was measured for the prepared samples. The magnitude of Seebeck coefficient depends on the composition and resistivity of the samples.

Speaker Identification Using Admissible Wavelet Packet Based Decomposition

Mel Frequency Cepstral Coefficient (MFCC) features are widely used as acoustic features for speech recognition as well as speaker recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolution in low frequency region, and a low resolution in high frequency region. This kind of processing is good for obtaining stable phonetic information, but not suitable for speaker features that are located in high frequency regions. The speaker individual information, which is non-uniformly distributed in the high frequencies, is equally important for speaker recognition. Based on this fact we proposed an admissible wavelet packet based filter structure for speaker identification. Multiresolution capabilities of wavelet packet transform are used to derive the new features. The proposed scheme differs from previous wavelet based works, mainly in designing the filter structure. Unlike others, the proposed filter structure does not follow Mel scale. The closed-set speaker identification experiments performed on the TIMIT database shows improved identification performance compared to other commonly used Mel scale based filter structures using wavelets.

Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.

Spectral Analysis of Speech: A New Technique

ICA which is generally used for blind source separation problem has been tested for feature extraction in Speech recognition system to replace the phoneme based approach of MFCC. Applying the Cepstral coefficients generated to ICA as preprocessing has developed a new signal processing approach. This gives much better results against MFCC and ICA separately, both for word and speaker recognition. The mixing matrix A is different before and after MFCC as expected. As Mel is a nonlinear scale. However, cepstrals generated from Linear Predictive Coefficient being independent prove to be the right candidate for ICA. Matlab is the tool used for all comparisons. The database used is samples of ISOLET.

UWB Bowtie Slot Antenna for Breast Cancer Detection

UWB is a very attractive technology for many applications. It provides many advantages such as fine resolution and high power efficiency. Our interest in the current study is the use of UWB radar technique in microwave medical imaging systems, especially for early breast cancer detection. The Federal Communications Commission FCC allowed frequency bandwidth of 3.1 to 10.6 GHz for this purpose. In this paper we suggest an UWB Bowtie slot antenna with enhanced bandwidth. Effects of varying the geometry of the antenna on its performance and bandwidth are studied. The proposed antenna is simulated in CST Microwave Studio. Details of antenna design and simulation results such as return loss and radiation patterns are discussed in this paper. The final antenna structure exhibits good UWB characteristics and has surpassed the bandwidth requirements.

Musical Instrument Classification Using Embedded Hidden Markov Models

In this paper, a novel method for recognition of musical instruments in a polyphonic music is presented by using an embedded hidden Markov model (EHMM). EHMM is a doubly embedded HMM structure where each state of the external HMM is an independent HMM. The classification is accomplished for two different internal HMM structures where GMMs are used as likelihood estimators for the internal HMMs. The results are compared to those achieved by an artificial neural network with two hidden layers. Appropriate classification accuracies were achieved both for solo instrument performance and instrument combinations which demonstrates that the new approach outperforms the similar classification methods by means of the dynamic of the signal.

Demonstration of a Low-Cost Monocycle Pulse for UWB Radio Transceiver

This paper presents a simple and original method for the generation of short monocycle pulses based on the transient response of a passive band-pass filter. The recorded sub-nanosecond pulses show a good symmetry and a small ringing (13 % of the peak amplitude). Their spectral density covers the range 3.1 GHz to 10.6 GHz. The possibility to adapt the pulse spectral density to the indoor FCC frequency mask is demonstrated with a prototype working at a reduced frequency (FCC/1000). A detection technique is proposed.

Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System

Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.