Abstract: Arenga pinnata is an abundantly natural fiber that can be used for sound proof material. However, the scientific data of acoustics properties of Arenga pinnata was not available yet. In this study the sound absorption of pure arenga pinnata was measured. The thickness of Arenga pinnata was varied in 10 mm, 20 mm, 30mm, and 40mm. This work was carried out to investigate the potential of using Arenga pinnata fiber as raw material for sound absorbing material. Impedance Tube Method was used to measure sound absorption coefficient (α). The Measurements was done in accordance with ASTM E1050-98, that is the standard test method for impedance and absorption of acoustical materials using a tube, two microphones and a digital frequency analysis system . The results showed that sound absorption coefficients of Arenga pinnata were good from 2000 Hz to 5000 Hz within the range of 0.75 – 0.90. The optimum sound absorption coefficient was obtained from the thickness of 40 mm. These results indicated that Arenga pinnata fiber is promising to be used as raw material of sound absorbing material with low cost, light, and biodegradable.
Abstract: This paper describes Independent Component Analysis (ICA) based fixed-point algorithm for the blind separation of the convolutive mixture of speech, picked-up by a linear microphone array. The proposed algorithm extracts independent sources by non- Gaussianizing the Time-Frequency Series of Speech (TFSS) in a deflationary way. The degree of non-Gaussianization is measured by negentropy. The relative performances of algorithm under random initialization and Null beamformer (NBF) based initialization are studied. It has been found that an NBF based initial value gives speedy convergence as well as better separation performance
Abstract: This research work is aimed at speech recognition
using scaly neural networks. A small vocabulary of 11 words were
established first, these words are “word, file, open, print, exit, edit,
cut, copy, paste, doc1, doc2". These chosen words involved with
executing some computer functions such as opening a file, print
certain text document, cutting, copying, pasting, editing and exit.
It introduced to the computer then subjected to feature extraction
process using LPC (linear prediction coefficients). These features are
used as input to an artificial neural network in speaker dependent
mode. Half of the words are used for training the artificial neural
network and the other half are used for testing the system; those are
used for information retrieval.
The system components are consist of three parts, speech
processing and feature extraction, training and testing by using neural
networks and information retrieval.
The retrieve process proved to be 79.5-88% successful, which is
quite acceptable, considering the variation to surrounding, state of
the person, and the microphone type.
Abstract: In this paper, an estimation accuracy of multiple moving
talker tracking using a microphone array is improved. The tracking
can be achieved by the adaptive method in which two algorithms are integrated, namely, the PAST (Projection Approximation Subspace
Tracking) algorithm and the IPLS (Interior Point Least Square) algorithm. When either talker begins to speak again after a silent
period, an appropriate feasible region for an evaluation function of
the IPLS algorithm might not be set. Then, the tracking fails due to the incorrect updating. Therefore, if an increment of the number of
active talkers is detected, the feasible region must be reset. Then, a low cost realization is required for the high speed tracking and a high
accuracy realization is desired for the precise tracking. In this paper,
the directions roughly estimated using the delayed-sum-array method
are used for the resetting. Several results of experiments performed in
an actual room environment show the effectiveness of the proposed method.
Abstract: The practical implementation of audio-video coupled speech recognition systems is mainly limited by the hardware complexity to integrate two radically different information capturing devices with good temporal synchronisation. In this paper, we propose a solution based on a smart CMOS image sensor in order to simplify the hardware integration difficulties. By using on-chip image processing, this smart sensor can calculate in real time the X/Y projections of the captured image. This on-chip projection reduces considerably the volume of the output data. This data-volume reduction permits a transmission of the condensed visual information via the same audio channel by using a stereophonic input available on most of the standard computation devices such as PC, PDA and mobile phones. A prototype called VMIKE (Visio-Microphone) has been designed and realised by using standard 0.35um CMOS technology. A preliminary experiment gives encouraged results. Its efficiency will be further investigated in a large variety of applications such as biometrics, speech recognition in noisy environments, and vocal control for military or disabled persons, etc.
Abstract: In this paper, an extended method of the directionally constrained minimization of power (DCMP) algorithm for broadband signals is proposed. The DCMP algorithm is one of the useful techniques of extracting a target signal from observed signals of a microphone array system. In the DCMP algorithm, output power of the microphone array is minimized under a constraint of constant responses to directions of arrival (DOAs) of specific signals. In our algorithm, by limiting the directional constraint to the perpendicular direction to the sensor array system, the calculating time is reduced.
Abstract: In this project, a tele-operated anthropomorphic
robotic arm and hand is designed and built as a versatile robotic arm
system. The robot has the ability to manipulate objects such as pick
and place operations. It is also able to function by itself, in
standalone mode.
Firstly, the robotic arm is built in order to interface with a personal
computer via a serial servo controller circuit board. The circuit board
enables user to completely control the robotic arm and moreover,
enables feedbacks from user. The control circuit board uses a
powerful integrated microcontroller, a PIC (Programmable Interface
Controller). The PIC is firstly programmed using BASIC (Beginner-s
All-purpose Symbolic Instruction Code) and it is used as the 'brain'
of the robot. In addition a user friendly Graphical User Interface
(GUI) is developed as the serial servo interface software using
Microsoft-s Visual Basic 6.
The second part of the project is to use speech recognition control
on the robotic arm. A speech recognition circuit board is constructed
with onboard components such as PIC and other integrated circuits. It
replaces the computers- Graphical User Interface. The robotic arm is
able to receive instructions as spoken commands through a
microphone and perform operations with respect to the commands
such as picking and placing operations.
Abstract: Distant-talking voice-based HCI system suffers from
performance degradation due to mismatch between the acoustic
speech (runtime) and the acoustic model (training). Mismatch is
caused by the change in the power of the speech signal as observed at
the microphones. This change is greatly influenced by the change in
distance, affecting speech dynamics inside the room before reaching
the microphones. Moreover, as the speech signal is reflected, its
acoustical characteristic is also altered by the room properties. In
general, power mismatch due to distance is a complex problem. This
paper presents a novel approach in dealing with distance-induced
mismatch by intelligently sensing instantaneous voice power variation
and compensating model parameters. First, the distant-talking speech
signal is processed through microphone array processing, and the
corresponding distance information is extracted. Distance-sensitive
Gaussian Mixture Models (GMMs), pre-trained to capture both
speech power and room property are used to predict the optimal
distance of the speech source. Consequently, pre-computed statistic
priors corresponding to the optimal distance is selected to correct
the statistics of the generic model which was frozen during training.
Thus, model combinatorics are post-conditioned to match the power
of instantaneous speech acoustics at runtime. This results to an
improved likelihood in predicting the correct speech command at
farther distances. We experiment using real data recorded inside two
rooms. Experimental evaluation shows voice recognition performance
using our method is more robust to the change in distance compared
to the conventional approach. In our experiment, under the most
acoustically challenging environment (i.e., Room 2: 2.5 meters), our
method achieved 24.2% improvement in recognition performance
against the best-performing conventional method.
Abstract: Acoustic Imaging based sound localization using microphone
array is a challenging task in digital-signal processing.
Discrete Fourier transform (DFT) based near-field acoustical holography
(NAH) is an important acoustical technique for sound source
localization and provide an efficient solution to the ill-posed problem.
However, in practice, due to the usage of small curtailed aperture
and its consequence of significant spectral leakage, the DFT could
not reconstruct the active-region-of-sound (AROS) effectively, especially
near the edges of aperture. In this paper, we emphasize the
fundamental problems of DFT-based NAH, provide a solution to
spectral leakage effect by the extrapolation based on linear predictive
coding and 2D Tukey windowing. This approach has been tested to
localize the single and multi-point sound sources. We observe that
incorporating extrapolation technique increases the spatial resolution,
localization accuracy and reduces spectral leakage when small curtail
aperture with a lower number of sensors accounts.
Abstract: The present work faces the problem of automatic enumeration and recognition of an unknown and time-varying number of environmental sound sources while using a single microphone. The assumption that is made is that the sound recorded is a realization of sound sources belonging to a group of audio classes which is known a-priori. We describe two variations of the same principle which is to calculate the distance between the current unknown audio frame and all possible combinations of the classes that are assumed to span the soundscene. We concentrate on categorizing environmental sound sources, such as birds, insects etc. in the task of monitoring the biodiversity of a specific habitat.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.
Abstract: In this study, the use of silicon NAM (Non-Audible
Murmur) microphone in automatic speech recognition is presented.
NAM microphones are special acoustic sensors, which are attached
behind the talker-s ear and can capture not only normal (audible)
speech, but also very quietly uttered speech (non-audible murmur).
As a result, NAM microphones can be applied in automatic speech
recognition systems when privacy is desired in human-machine communication.
Moreover, NAM microphones show robustness against
noise and they might be used in special systems (speech recognition,
speech conversion etc.) for sound-impaired people. Using a small
amount of training data and adaptation approaches, 93.9% word
accuracy was achieved for a 20k Japanese vocabulary dictation
task. Non-audible murmur recognition in noisy environments is also
investigated. In this study, further analysis of the NAM speech has
been made using distance measures between hidden Markov model
(HMM) pairs. It has been shown the reduced spectral space of NAM
speech using a metric distance, however the location of the different
phonemes of NAM are similar to the location of the phonemes
of normal speech, and the NAM sounds are well discriminated.
Promising results in using nonlinear features are also introduced,
especially under noisy conditions.