Abstract: The practical implementation of audio-video coupled speech recognition systems is mainly limited by the hardware complexity to integrate two radically different information capturing devices with good temporal synchronisation. In this paper, we propose a solution based on a smart CMOS image sensor in order to simplify the hardware integration difficulties. By using on-chip image processing, this smart sensor can calculate in real time the X/Y projections of the captured image. This on-chip projection reduces considerably the volume of the output data. This data-volume reduction permits a transmission of the condensed visual information via the same audio channel by using a stereophonic input available on most of the standard computation devices such as PC, PDA and mobile phones. A prototype called VMIKE (Visio-Microphone) has been designed and realised by using standard 0.35um CMOS technology. A preliminary experiment gives encouraged results. Its efficiency will be further investigated in a large variety of applications such as biometrics, speech recognition in noisy environments, and vocal control for military or disabled persons, etc.
Abstract: The voice signal in Voice over Internet protocol (VoIP) system is processed through the best effort policy based IP network, which leads to the network degradations including delay, packet loss jitter. The work in this paper presents the implementation of finite impulse response (FIR) filter for voice quality improvement in the VoIP system through distributed arithmetic (DA) algorithm. The VoIP simulations are conducted with AMR-NB 6.70 kbps and G.729a speech coders at different packet loss rates and the performance of the enhanced VoIP signal is evaluated using the perceptual evaluation of speech quality (PESQ) measurement for narrowband signal. The results show reduction in the computational complexity in the system and significant improvement in the quality of the VoIP voice signal.
Abstract: M. Kemal Ataturk was a great leader who was fond of art and he had displayed his being fond of art many times. In his speeches and writings you can see that he had showed his approval to art and the importance of artists and art for the society. During the foundation of republic, he also wanted renovation in art as in other fields and ordered many novelties both in art and society. One of the greatest steps in realizing this was to prepare a national Turkish opera. In this study, it was studied how a Turkish opera, Özsoy was prepared in the context of social and political conditions of that time and what kind of processes it passed. As a result, it is seen that there was two main aims for Ataturk with this opera. First, Ataturk wanted to abolish the sectarian conflict between Iran and Turkey going on for centuries. The second and maybe the most important is that he wanted to make a revolution in the field of art and aimed to reach the level of civilized countries.
Abstract: Innovations in technology have created new ethical
challenges. Essential use of electronic communication in the
workplace has escalated at an astronomical rate over the past decade.
As such, legal and ethical dilemmas confronted by both the employer
and the employee concerning managerial control and ownership of einformation
have increased dramatically in the USA. From the
employer-s perspective, ownership and control of all information
created for the workplace is an undeniable source of economic
advantage and must be monitored zealously. From the perspective of
the employee, individual rights, such as privacy, freedom of speech,
and freedom from unreasonable search and seizure, continue to be
stalwart legal guarantees that employers are not legally or ethically
entitled to abridge in the workplace. These issues have been the
source of great debate and the catalyst for legal reform. The fine line
between ethical and legal has been complicated by emerging
technologies. This manuscript will identify and discuss a number of
specific legal and ethical issues raised by the dynamic electronic
workplace and conclude with suggestions that employers should
follow to respect the delicate balance between employees- legal
rights to privacy and the employer's right to protect its knowledge
systems and infrastructure.
Abstract: We analyze the effectivity of different pseudo noise (PN) and orthogonal sequences for encrypting speech signals in terms of perceptual intelligence. Speech signal can be viewed as sequence of correlated samples and each sample as sequence of bits. The residual intelligibility of the speech signal can be reduced by removing the correlation among the speech samples. PN sequences have random like properties that help in reducing the correlation among speech samples. The mean square aperiodic auto-correlation (MSAAC) and the mean square aperiodic cross-correlation (MSACC) measures are used to test the randomness of the PN sequences. Results of the investigation show the effectivity of large Kasami sequences for this purpose among many PN sequences.
Abstract: This paper is taken into consideration the problem of cryptanalysis of stream ciphers. There is some attempts need to improve the existing attacks on stream cipher and to make an attempt to distinguish the portions of cipher text obtained by the encryption of plain text in which some parts of the text are random and the rest are non-random. This paper presents a tutorial introduction to symmetric cryptography. The basic information theoretic and computational properties of classic and modern cryptographic systems are presented, followed by an examination of the application of cryptography to the security of VoIP system in computer networks using LFSR algorithm. The implementation program will be developed Java 2. LFSR algorithm is appropriate for the encryption and decryption of online streaming data, e.g. VoIP (voice chatting over IP). This paper is implemented the encryption module of speech signals to cipher text and decryption module of cipher text to speech signals.
Abstract: In this paper, an extended method of the directionally constrained minimization of power (DCMP) algorithm for broadband signals is proposed. The DCMP algorithm is one of the useful techniques of extracting a target signal from observed signals of a microphone array system. In the DCMP algorithm, output power of the microphone array is minimized under a constraint of constant responses to directions of arrival (DOAs) of specific signals. In our algorithm, by limiting the directional constraint to the perpendicular direction to the sensor array system, the calculating time is reduced.
Abstract: We propose a new perspective on speech
communication using blind source separation. The original speech is
mixed with key signals which consist of the mixing matrix, chaotic
signals and a random noise. However, parts of the keys (the mixing
matrix and the random noise) are not necessary in decryption. In
practice implement, one can encrypt the speech by changing the noise
signal every time. Hence, the present scheme obtains the advantages
of a One Time Pad encryption while avoiding its drawbacks in key
exchange. It is demonstrated that the proposed scheme is immune
against traditional attacks.
Abstract: We present a novel scheme to recognize isolated speech
signals using certain statistical parameters derived from those signals.
The determination of the statistical estimates is based on extracted
signal information rather than the original signal information in
order to reduce the computational complexity. Subtle details of
these estimates, after extracting the speech signal from ambience
noise, are first exploited to segregate the polysyllabic words from
the monosyllabic ones. Precise recognition of each distinct word is
then carried out by analyzing the histogram, obtained from these
information.
Abstract: The Automatic Speech Recognition (ASR) applied to
Arabic language is a challenging task. This is mainly related to the
language specificities which make the researchers facing multiple
difficulties such as the insufficient linguistic resources and the very
limited number of available transcribed Arabic speech corpora. In
this paper, we are interested in the development of a HMM-based
ASR system for Standard Arabic (SA) language. Our fundamental
research goal is to select the most appropriate acoustic parameters
describing each audio frame, acoustic models and speech recognition
unit. To achieve this purpose, we analyze the effect of varying frame
windowing (size and period), acoustic parameter number resulting
from features extraction methods traditionally used in ASR, speech
recognition unit, Gaussian number per HMM state and number of
embedded re-estimations of the Baum-Welch Algorithm. To evaluate
the proposed ASR system, a multi-speaker SA connected-digits
corpus is collected, transcribed and used throughout all experiments.
A further evaluation is conducted on a speaker-independent continue
SA speech corpus. The phonemes recognition rate is 94.02% which is
relatively high when comparing it with another ASR system
evaluated on the same corpus.
Abstract: Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the “wine recognition data" located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.
Abstract: In this paper we present an enhanced noise reduction method for robust speech recognition using Adaptive Gain Equalizer with Non linear Spectral Subtraction. In Adaptive Gain Equalizer method (AGE), the input signal is divided into a number of subbands that are individually weighed in time domain, in accordance to the short time Signal-to-Noise Ratio (SNR) in each subband estimation at every time instant. Instead of focusing on suppression the noise on speech enhancement is focused. When analysis was done under various noise conditions for speech recognition, it was found that Adaptive Gain Equalizer method algorithm has an obvious failing point for a SNR of -5 dB, with inadequate levels of noise suppression for SNR less than this point. This work proposes the implementation of AGE when coupled with Non linear Spectral Subtraction (AGE-NSS) for robust speech recognition. The experimental result shows that out AGE-NSS performs the AGE when SNR drops below -5db level.
Abstract: This paper proposes evaluation of sound parameterization methods in recognizing some spoken Arabic words, namely digits from zero to nine. Each isolated spoken word is represented by a single template based on a specific recognition feature, and the recognition is based on the Euclidean distance from those templates. The performance analysis of recognition is based on four parameterization features: the Burg Spectrum Analysis, the Walsh Spectrum Analysis, the Thomson Multitaper Spectrum Analysis and the Mel Frequency Cepstral Coefficients (MFCC) features. The main aim of this paper was to compare, analyze, and discuss the outcomes of spoken Arabic digits recognition systems based on the selected recognition features. The results acqired confirm that the use of MFCC features is a very promising method in recognizing Spoken Arabic digits.
Abstract: In this project, a tele-operated anthropomorphic
robotic arm and hand is designed and built as a versatile robotic arm
system. The robot has the ability to manipulate objects such as pick
and place operations. It is also able to function by itself, in
standalone mode.
Firstly, the robotic arm is built in order to interface with a personal
computer via a serial servo controller circuit board. The circuit board
enables user to completely control the robotic arm and moreover,
enables feedbacks from user. The control circuit board uses a
powerful integrated microcontroller, a PIC (Programmable Interface
Controller). The PIC is firstly programmed using BASIC (Beginner-s
All-purpose Symbolic Instruction Code) and it is used as the 'brain'
of the robot. In addition a user friendly Graphical User Interface
(GUI) is developed as the serial servo interface software using
Microsoft-s Visual Basic 6.
The second part of the project is to use speech recognition control
on the robotic arm. A speech recognition circuit board is constructed
with onboard components such as PIC and other integrated circuits. It
replaces the computers- Graphical User Interface. The robotic arm is
able to receive instructions as spoken commands through a
microphone and perform operations with respect to the commands
such as picking and placing operations.
Abstract: In real-field applications, the correct determination of voice segments highly improves the overall system accuracy and minimises the total computation time. This paper presents reliable measures of speech compression by detcting the end points of the speech signals prior to compressing them. The two different compession schemes used are the Global threshold and the Level- Dependent threshold techniques. The performance of the proposed method is tested wirh the Signal to Noise Ratios, Peak Signal to Noise Ratios and Normalized Root Mean Square Error parameter measures.
Abstract: Distant-talking voice-based HCI system suffers from
performance degradation due to mismatch between the acoustic
speech (runtime) and the acoustic model (training). Mismatch is
caused by the change in the power of the speech signal as observed at
the microphones. This change is greatly influenced by the change in
distance, affecting speech dynamics inside the room before reaching
the microphones. Moreover, as the speech signal is reflected, its
acoustical characteristic is also altered by the room properties. In
general, power mismatch due to distance is a complex problem. This
paper presents a novel approach in dealing with distance-induced
mismatch by intelligently sensing instantaneous voice power variation
and compensating model parameters. First, the distant-talking speech
signal is processed through microphone array processing, and the
corresponding distance information is extracted. Distance-sensitive
Gaussian Mixture Models (GMMs), pre-trained to capture both
speech power and room property are used to predict the optimal
distance of the speech source. Consequently, pre-computed statistic
priors corresponding to the optimal distance is selected to correct
the statistics of the generic model which was frozen during training.
Thus, model combinatorics are post-conditioned to match the power
of instantaneous speech acoustics at runtime. This results to an
improved likelihood in predicting the correct speech command at
farther distances. We experiment using real data recorded inside two
rooms. Experimental evaluation shows voice recognition performance
using our method is more robust to the change in distance compared
to the conventional approach. In our experiment, under the most
acoustically challenging environment (i.e., Room 2: 2.5 meters), our
method achieved 24.2% improvement in recognition performance
against the best-performing conventional method.
Abstract: There have been significant improvements in automatic
voice recognition technology. However, existing systems still face difficulties,
particularly when used by non-native speakers with accents.
In this paper we address a problem of identifying the English accented
speech of speakers from different backgrounds. Once an accent is
identified the speech recognition software can utilise training set from
appropriate accent and therefore improve the efficiency and accuracy
of the speech recognition system. We introduced the Q factor, which
is defined by the sum of relationships between frequencies of the
formants. Four different accents were considered and experimented
for this research. A scoring method was introduced in order to
effectively analyse accents. The proposed concept indicates that the
accent could be identified by analysing their formants.
Abstract: This paper presents the cepstral and trispectral
analysis of a speech signal produced by normal men, men with
defective audition (deaf, deep deaf) and others affected by
tracheotomy, the trispectral analysis based on parametric methods
(Autoregressive AR) using the fourth order cumulant. These
analyses are used to detect and compare the pitches and the formants
of corresponding voiced sounds (vowel \a\, \i\ and \u\). The first
results appear promising, since- it seems after several experimentsthere
is no deformation of the spectrum as one could have supposed
it at the beginning, however these pathologies influenced the two
characteristics:
The defective audition influences to the formants contrary to the
tracheotomy, which influences the fundamental frequency (pitch).
Abstract: This paper introduces an automatic voice classification
system for the diagnosis of individual constitution based on Sasang
Constitutional Medicine (SCM) in Traditional Korean Medicine
(TKM). For the developing of this algorithm, we used the voices of
309 female speakers and extracted a total of 134 speech features from
the voice data consisting of 5 sustained vowels and one sentence. The
classification system, based on a rule-based algorithm that is derived
from a non parametric statistical method, presents 3 types of decisions:
reserved, positive and negative decisions. In conclusion, 71.5% of the
voice data were diagnosed by this system, of which 47.7% were
correct positive decisions and 69.7% were correct negative decisions.
Abstract: This paper presents a vocoder to obtain high quality synthetic speech at 600 bps. To reduce the bit rate, the algorithm is based on a sinusoidally excited linear prediction model which extracts few coding parameters, and three consecutive frames are grouped into a superframe and jointly vector quantization is used to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced frame combinations in the superframe. Experimental results show that the quality of the proposed coder is better than that of 2.4kbps LPC10e and achieves approximately the same as that of 2.4kbps MELP and with high robustness.