Abstract: A state of the art Speaker Identification (SI) system
requires a robust feature extraction unit followed by a speaker
modeling scheme for generalized representation of these features.
Over the years, Mel-Frequency Cepstral Coefficients (MFCC)
modeled on the human auditory system has been used as a standard
acoustic feature set for speech related applications. On a recent
contribution by authors, it has been shown that the Inverted Mel-
Frequency Cepstral Coefficients (IMFCC) is useful feature set for
SI, which contains complementary information present in high
frequency region. This paper introduces the Gaussian shaped filter
(GF) while calculating MFCC and IMFCC in place of typical
triangular shaped bins. The objective is to introduce a higher
amount of correlation between subband outputs. The performances
of both MFCC & IMFCC improve with GF over conventional
triangular filter (TF) based implementation, individually as well as
in combination. With GMM as speaker modeling paradigm, the
performances of proposed GF based MFCC and IMFCC in
individual and fused mode have been verified in two standard
databases YOHO, (Microphone Speech) and POLYCOST
(Telephone Speech) each of which has more than 130 speakers.
Abstract: In this paper, we propose a practical digital music matching system that is robust to variation in sound qualities. The proposed system is subdivided into two parts: client and server. The client part consists of the input, preprocessing and feature extraction modules. The preprocessing module, including the music onset module, revises the value gap occurring on the time axis between identical songs of different formats. The proposed method uses delta-grouped Mel frequency cepstral coefficients (MFCCs) to extract music features that are robust to changes in sound quality. According to the number of sound quality formats (SQFs) used, a music server is constructed with a feature database (FD) that contains different sub feature databases (SFDs). When the proposed system receives a music file, the selection module selects an appropriate SFD from a feature database; the selected SFD is subsequently used by the matching module. In this study, we used 3,000 queries for matching experiments in three cases with different FDs. In each case, we used 1,000 queries constructed by mixing 8 SQFs and 125 songs. The success rate of music matching improved from 88.6% when using single a single SFD to 93.2% when using quadruple SFDs. By this experiment, we proved that the proposed method is robust to various sound qualities.
Abstract: An emotional speech recognition system for the
applications on smart phones was proposed in this study to combine
with 3G mobile communications and social networks to provide users
and their groups with more interaction and care. This study developed
a mechanism using the support vector machines (SVM) to recognize
the emotions of speech such as happiness, anger, sadness and normal.
The mechanism uses a hierarchical classifier to adjust the weights of
acoustic features and divides various parameters into the categories of
energy and frequency for training. In this study, 28 commonly used
acoustic features including pitch and volume were proposed for
training. In addition, a time-frequency parameter obtained by
continuous wavelet transforms was also used to identify the accent and
intonation in a sentence during the recognition process. The Berlin
Database of Emotional Speech was used by dividing the speech into
male and female data sets for training. According to the experimental
results, the accuracies of male and female test sets were increased by
4.6% and 5.2% respectively after using the time-frequency parameter
for classifying happy and angry emotions. For the classification of all
emotions, the average accuracy, including male and female data, was
63.5% for the test set and 90.9% for the whole data set.
Abstract: In a handwriting recognition problem, characters can
be represented using chain codes. The main problem in representing
characters using chain code is optimizing the length of the chain
code. This paper proposes to use randomized algorithm to minimize
the length of Freeman Chain Codes (FCC) generated from isolated
handwritten characters. Feedforward neural network is used in the
classification stage to recognize the image characters. Our test results
show that by applying the proposed model, we reached a relatively
high accuracy for the problem of isolated handwritten when tested on
NIST database.
Abstract: Heart sound is an acoustic signal and many techniques
used nowadays for human recognition tasks borrow speech recognition
techniques. One popular choice for feature extraction of accoustic
signals is the Mel Frequency Cepstral Coefficients (MFCC) which
maps the signal onto a non-linear Mel-Scale that mimics the human
hearing. However the Mel-Scale is almost linear in the frequency
region of heart sounds and thus should produce similar results with
the standard cepstral coefficients (CC). In this paper, MFCC is
investigated to see if it produces superior results for PCG based
human identification system compared to CC. Results show that the
MFCC system is still superior to CC despite linear filter-banks in
the lower frequency range, giving up to 95% correct recognition rate
for MFCC and 90% for CC. Further experiments show that the high
recognition rate is due to the implementation of filter-banks and not
from Mel-Scaling.
Abstract: The mineral having chemical compositional formula MgAl2O4 is called “spinel". The ferrites crystallize in spinel structure are known as spinel-ferrites or ferro-spinels. The spinel structure has a fcc cage of oxygen ions and the metallic cations are distributed among tetrahedral (A) and octahedral (B) interstitial voids (sites). The X-ray diffraction (XRD) intensity of each Bragg plane is sensitive to the distribution of cations in the interstitial voids of the spinel lattice. This leads to the method of determination of distribution of cations in the spinel oxides through XRD intensity analysis. The computer program for XRD intensity analysis has been developed in C language and also tested for the real experimental situation by synthesizing the spinel ferrite materials Mg0.6Zn0.4AlxFe2- xO4 and characterized them by X-ray diffractometry. The compositions of Mg0.6Zn0.4AlxFe2-xO4(x = 0.0 to 0.6) ferrites have been prepared by ceramic method and powder X-ray diffraction patterns were recorded. Thus, the authenticity of the program is checked by comparing the theoretically calculated data using computer simulation with the experimental ones. Further, the deduced cation distributions were used to fit the magnetization data using Localized canting of spins approach to explain the “recovery" of collinear spin structure due to Al3+ - substitution in Mg-Zn ferrites which is the case if A-site magnetic dilution and non-collinear spin structure. Since the distribution of cations in the spinel ferrites plays a very important role with regard to their electrical and magnetic properties, it is essential to determine the cation distribution in spinel lattice.
Abstract: This paper presents a new ultra-wideband (UWB) bandpass filter (BPF) with sharp roll-off and dual-notched bands. The filter consists of a triangle ring multi-mode resonator (MMR) with the stub-loaded resonator (SLR) for controlling the two transmission zeros at 2.8 / 11 GHz, the embedded open-circuited stub and the asymmetric tight coupled input/output (I/O) lines for introducing the dual notched bands at 5.2 / 6.8 GHz. The attenuation slope in the lower and higher passband edges of the proposed filter show 160- and 153-dB/GHz, respectively. This study mainly provides a simple method to design a UWB bandpass filter with high passband selectivity and dual-notched bands for satisfying the Federal Communications Commission (FCC-defined) indoor UWB specification
Abstract: This paper proposes evaluation of sound parameterization methods in recognizing some spoken Arabic words, namely digits from zero to nine. Each isolated spoken word is represented by a single template based on a specific recognition feature, and the recognition is based on the Euclidean distance from those templates. The performance analysis of recognition is based on four parameterization features: the Burg Spectrum Analysis, the Walsh Spectrum Analysis, the Thomson Multitaper Spectrum Analysis and the Mel Frequency Cepstral Coefficients (MFCC) features. The main aim of this paper was to compare, analyze, and discuss the outcomes of spoken Arabic digits recognition systems based on the selected recognition features. The results acqired confirm that the use of MFCC features is a very promising method in recognizing Spoken Arabic digits.
Abstract: The goal of this project is to design a system to
recognition voice commands. Most of voice recognition systems
contain two main modules as follow “feature extraction" and “feature
matching". In this project, MFCC algorithm is used to simulate
feature extraction module. Using this algorithm, the cepstral
coefficients are calculated on mel frequency scale. VQ (vector
quantization) method will be used for reduction of amount of data to
decrease computation time. In the feature matching stage Euclidean
distance is applied as similarity criterion. Because of high accuracy
of used algorithms, the accuracy of this voice command system is
high. Using these algorithms, by at least 5 times repetition for each
command, in a single training session, and then twice in each testing
session zero error rate in recognition of commands is achieved.
Abstract: Samples of CoFe2-xCrxO4 where x varies from 0.0 to 0.5 were prepared by co-precipitation route. These samples were sintered at 750°C for 2 hours. These particles were characterized by X-ray diffraction (XRD) at room temperature. The FCC spinel structure was confirmed by XRD patterns of the samples. The crystallite sizes of these particles were calculated from the most intense peak by Scherrer formula. The crystallite sizes lie in the range of 37-60 nm. The lattice parameter was found decreasing upon substitution of Cr. DC electrical resistivity was measured as a function of temperature. The room temperature thermoelectric power was measured for the prepared samples. The magnitude of Seebeck coefficient depends on the composition and resistivity of the samples.
Abstract: Mel Frequency Cepstral Coefficient (MFCC) features
are widely used as acoustic features for speech recognition as well
as speaker recognition. In MFCC feature representation, the Mel frequency
scale is used to get a high resolution in low frequency region,
and a low resolution in high frequency region. This kind of processing
is good for obtaining stable phonetic information, but not suitable
for speaker features that are located in high frequency regions. The
speaker individual information, which is non-uniformly distributed
in the high frequencies, is equally important for speaker recognition.
Based on this fact we proposed an admissible wavelet packet based
filter structure for speaker identification. Multiresolution capabilities
of wavelet packet transform are used to derive the new features.
The proposed scheme differs from previous wavelet based works,
mainly in designing the filter structure. Unlike others, the proposed
filter structure does not follow Mel scale. The closed-set speaker
identification experiments performed on the TIMIT database shows
improved identification performance compared to other commonly
used Mel scale based filter structures using wavelets.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.
Abstract: ICA which is generally used for blind source separation
problem has been tested for feature extraction in Speech recognition
system to replace the phoneme based approach of MFCC. Applying
the Cepstral coefficients generated to ICA as preprocessing has
developed a new signal processing approach. This gives much better
results against MFCC and ICA separately, both for word and speaker
recognition. The mixing matrix A is different before and after MFCC
as expected. As Mel is a nonlinear scale. However, cepstrals
generated from Linear Predictive Coefficient being independent
prove to be the right candidate for ICA. Matlab is the tool used for
all comparisons. The database used is samples of ISOLET.
Abstract: UWB is a very attractive technology for many
applications. It provides many advantages such as fine resolution and high power efficiency. Our interest in the current study is the use of
UWB radar technique in microwave medical imaging systems, especially for early breast cancer detection. The Federal Communications Commission FCC allowed frequency bandwidth of
3.1 to 10.6 GHz for this purpose. In this paper we suggest an UWB Bowtie slot antenna with enhanced bandwidth. Effects of varying the geometry of the antenna
on its performance and bandwidth are studied. The proposed antenna
is simulated in CST Microwave Studio. Details of antenna design and
simulation results such as return loss and radiation patterns are discussed in this paper. The final antenna structure exhibits good
UWB characteristics and has surpassed the bandwidth requirements.
Abstract: In this paper, a novel method for recognition of musical
instruments in a polyphonic music is presented by using an
embedded hidden Markov model (EHMM). EHMM is a doubly
embedded HMM structure where each state of the external HMM
is an independent HMM. The classification is accomplished for
two different internal HMM structures where GMMs are used as
likelihood estimators for the internal HMMs. The results are compared
to those achieved by an artificial neural network with two
hidden layers. Appropriate classification accuracies were achieved
both for solo instrument performance and instrument combinations
which demonstrates that the new approach outperforms the similar
classification methods by means of the dynamic of the signal.
Abstract: This paper presents a simple and original method for
the generation of short monocycle pulses based on the transient
response of a passive band-pass filter. The recorded sub-nanosecond
pulses show a good symmetry and a small ringing (13 % of the peak
amplitude). Their spectral density covers the range 3.1 GHz to
10.6 GHz. The possibility to adapt the pulse spectral density to the
indoor FCC frequency mask is demonstrated with a prototype
working at a reduced frequency (FCC/1000). A detection technique is
proposed.
Abstract: Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.