Abstract: This paper presents a vocoder to obtain high quality synthetic speech at 600 bps. To reduce the bit rate, the algorithm is based on a sinusoidally excited linear prediction model which extracts few coding parameters, and three consecutive frames are grouped into a superframe and jointly vector quantization is used to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced frame combinations in the superframe. Experimental results show that the quality of the proposed coder is better than that of 2.4kbps LPC10e and achieves approximately the same as that of 2.4kbps MELP and with high robustness.
Abstract: The goal of this project is to design a system to
recognition voice commands. Most of voice recognition systems
contain two main modules as follow “feature extraction" and “feature
matching". In this project, MFCC algorithm is used to simulate
feature extraction module. Using this algorithm, the cepstral
coefficients are calculated on mel frequency scale. VQ (vector
quantization) method will be used for reduction of amount of data to
decrease computation time. In the feature matching stage Euclidean
distance is applied as similarity criterion. Because of high accuracy
of used algorithms, the accuracy of this voice command system is
high. Using these algorithms, by at least 5 times repetition for each
command, in a single training session, and then twice in each testing
session zero error rate in recognition of commands is achieved.
Abstract: The transformation of vocal characteristics aims at
modifying voice such that the intelligibility of aphonic voice is
increased or the voice characteristics of a speaker (source speaker) to
be perceived as if another speaker (target speaker) had uttered it. In
this paper, the current state-of-the-art voice characteristics
transformation methodology is reviewed. Special emphasis is placed
on voice transformation methodology and issues for improving the
transformed speech quality in intelligibility and naturalness are
discussed. In particular, it is suggested to use the modulation theory
of speech as a base for research on high quality voice transformation.
This approach allows one to separate linguistic, expressive, organic
and perspective information of speech, based on an analysis of how
they are fused when speech is produced. Therefore, this theory
provides the fundamentals not only for manipulating non-linguistic,
extra-/paralinguistic and intra-linguistic variables for voice
transformation, but also for paving the way for easily transposing the
existing voice transformation methods to emotion-related voice
quality transformation and speaking style transformation. From the
perspectives of human speech production and perception, the popular
voice transformation techniques are described and classified them
based on the underlying principles either from the speech production
or perception mechanisms or from both. In addition, the advantages
and limitations of voice transformation techniques and the
experimental manipulation of vocal cues are discussed through
examples from past and present research. Finally, a conclusion and
road map are pointed out for more natural voice transformation
algorithms in the future.
Abstract: Preprocessing of speech signals is considered a crucial step in the development of a robust and efficient speech or speaker recognition system. In this paper, we present some popular statistical outlier-detection based strategies to segregate the silence/unvoiced part of the speech signal from the voiced portion. The proposed methods are based on the utilization of the 3 σ edit rule, and the Hampel Identifier which are compared with the conventional techniques: (i) short-time energy (STE) based methods, and (ii) distribution based methods. The results obtained after applying the proposed strategies on some test voice signals are encouraging.
Abstract: The nonlinear damping behavior is usually ignored in
the design of a miniature moving-coil loudspeaker. But when the
loudspeaker operated in air, the damping parameter varies with the
voice-coil displacement corresponding due to viscous air flow. The
present paper presents an identification model as inverse problem to
identify the nonlinear damping parameter in the lumped parameter
model for the loudspeaker. Theoretical results for the nonlinear
damping are verified by using laser displacement measurement
scanner. These results indicate that the damping parameter has the
greatly different nonlinearity between in air and vacuum. It is believed
that the results of the present work can be applied in diagnosis and
sound quality improvement of a miniature loudspeaker.
Abstract: We provide a supervised speech-independent voice recognition technique in this paper. In the feature extraction stage we propose a mel-cepstral based approach. Our feature vector classification method uses a special nonlinear metric, derived from the Hausdorff distance for sets, and a minimum mean distance classifier.
Abstract: This article presents a simple way to perform programmed voice commands for the interface with commercial Digital and Analogue Input/Output PCI cards, used in Robotics and Automation applications. Robots and Automation equipment can "listen" to voice commands and perform several different tasks, approaching to the human behavior, and improving the human- machine interfaces for the Automation Industry. Since most PCI Digital and Analogue Input/Output cards are sold with several DLLs included (for use with different programming languages), it is possible to add speech recognition capability, using a standard speech recognition engine, compatible with the programming languages used. It was created in this work a Visual Basic 6 (the world's most popular language) application, that listens to several voice commands, and is capable to communicate directly with several standard 128 Digital I/O PCI Cards, used to control complete Automation Systems, with up to (number of boards used) x 128 Sensors and/or Actuators.
Abstract: This paper presents design features of a rescue robot, named CEO Mission II. Its body is designed to be the track wheel type with double front flippers for climbing over the collapse and the rough terrain. With 125 cm. long, 5-joint mechanical arm installed on the robot body, it is deployed not only for surveillance from the top view but also easier and faster access to the victims to get their vital signs. Two cameras and sensors for searching vital signs are set up at the tip of the multi-joint mechanical arm. The third camera is at the back of the robot for driving control. Hardware and software of the system, which controls and monitors the rescue robot, are explained. The control system is used for controlling the robot locomotion, the 5-joint mechanical arm, and for turning on/off devices. The monitoring system gathers all information from 7 distance sensors, IR temperature sensors, 3 CCD cameras, voice sensor, robot wheels encoders, yawn/pitch/roll angle sensors, laser range finder and 8 spare A/D inputs. All sensors and controlling data are communicated with a remote control station via IEEE 802.11b Wi-Fi. The audio and video data are compressed and sent via another IEEE 802.11g Wi-Fi transmitter for getting real-time response. At remote control station site, the robot locomotion and the mechanical arm are controlled by joystick. Moreover, the user-friendly GUI control program is developed based on the clicking and dragging method to easily control the movement of the arm. Robot traveling map is plotted from computing the information of wheel encoders and the yawn/pitch data. 2D Obstacle map is plotted from data of the laser range finder. The concept and design of this robot can be adapted to suit many other applications. As the Best Technique awardee from Thailand Rescue Robot Championship 2006, all testing results are satisfied.
Abstract: Insufficient Quality of Service (QoS) of Voice over
Internet Protocol (VoIP) is a growing concern that has lead the need
for research and study. In this paper we investigate the performance
of VoIP and the impact of resource limitations on the performance of
Access Networks. The impact of VoIP performance in Access
Networks is particularly important in regions where Internet
resources are limited and the cost of improving these resources is
prohibitive. It is clear that perceived VoIP performance, as measured
by mean opinion score [2] in experiments, where subjects are asked
to rate communication quality, is determined by end-to-end delay on
the communication path, delay variation, packet loss, echo, the
coding algorithm in use and noise. These performance indicators can
be measured and the affect in the Access Network can be estimated.
This paper investigates the congestion in the Access Network to the
overall performance of VoIP services with the presence of other
substantial uses of internet and ways in which Access Networks can
be designed to improve VoIP performance. Methods for analyzing
the impact of the Access Network on VoIP performance will be
surveyed and reviewed. This paper also considers some approaches
for improving performance of VoIP by carrying out experiments
using Network Simulator version 2 (NS2) software with a view to
gaining a better understanding of the design of Access Networks.
Abstract: This paper investigates the performance of a speech
recognizer in an interactive voice response system for various coded
speech signals, coded by using a vector quantization technique namely
Multi Switched Split Vector Quantization Technique. The process of
recognizing the coded output can be used in Voice banking application.
The recognition technique used for the recognition of the coded speech
signals is the Hidden Markov Model technique. The spectral distortion
performance, computational complexity, and memory requirements of
Multi Switched Split Vector Quantization Technique and the
performance of the speech recognizer at various bit rates have been
computed. From results it is found that the speech recognizer is
showing better performance at 24 bits/frame and it is found that the
percentage of recognition is being varied from 100% to 93.33% for
various bit rates.
Abstract: This work presents a novel means of extracting fixedlength parameters from voice signals, such that words can be recognized
in linear time. The power and the zero crossing rate are first
calculated segment by segment from a voice signal; by doing so, two
feature sequences are generated. We then construct an FIR system
across these two sequences. The parameters of this FIR system, used
as the input of a multilayer proceptron recognizer, can be derived by
recursive LSE (least-square estimation), implying that the complexity of overall process is linear to the signal size. In the second part of
this work, we introduce a weighting factor λ to emphasize recent
input; therefore, we can further recognize continuous speech signals.
Experiments employ the voice signals of numbers, from zero to nine, spoken in Mandarin Chinese. The proposed method is verified to
recognize voice signals efficiently and accurately.
Abstract: In the recent years multimedia traffic and in particular
VoIP services are growing dramatically. We present a new algorithm
to control the resource utilization and to optimize the voice codec
selection during SIP call setup on behalf of the traffic condition
estimated on the network path.
The most suitable methodologies and the tools that perform realtime
evaluation of the available bandwidth on a network path have
been integrated with our proposed algorithm: this selects the best
codec for a VoIP call in function of the instantaneous available
bandwidth on the path. The algorithm does not require any explicit
feedback from the network, and this makes it easily deployable over
the Internet. We have also performed intensive tests on real network
scenarios with a software prototype, verifying the algorithm
efficiency with different network topologies and traffic patterns
between two SIP PBXs.
The promising results obtained during the experimental validation
of the algorithm are now the basis for the extension towards a larger
set of multimedia services and the integration of our methodology
with existing PBX appliances.
Abstract: A Web-based learning tool, the Learn IN Context
(LINC) system, designed and being used in some institution-s
courses in mixed-mode learning, is presented in this paper. This
mode combines face-to-face and distance approaches to education.
LINC can achieve both collaborative and competitive learning. In
order to provide both learners and tutors with a more natural way to
interact with e-learning applications, a conversational interface has
been included in LINC. Hence, the components and essential features
of LINC+, the voice enhanced version of LINC, are described. We
report evaluation experiments of LINC/LINC+ in a real use context
of a computer programming course taught at the Université de
Moncton (Canada). The findings show that when the learning
material is delivered in the form of a collaborative and voice-enabled
presentation, the majority of learners seem to be satisfied with this
new media, and confirm that it does not negatively affect their
cognitive load.
Abstract: Unlike general-purpose processors, digital signal
processors (DSP processors) are strongly application-dependent. To
meet the needs for diverse applications, a wide variety of DSP
processors based on different architectures ranging from the
traditional to VLIW have been introduced to the market over the
years. The functionality, performance, and cost of these processors
vary over a wide range. In order to select a processor that meets the
design criteria for an application, processor performance is usually
the major concern for digital signal processing (DSP) application
developers. Performance data are also essential for the designers of
DSP processors to improve their design. Consequently, several DSP
performance benchmarks have been proposed over the past decade or
so. However, none of these benchmarks seem to have included recent
new DSP applications.
In this paper, we use a new benchmark that we recently developed
to compare the performance of popular DSP processors from Texas
Instruments and StarCore. The new benchmark is based on the
Selectable Mode Vocoder (SMV), a speech-coding program from the
recent third generation (3G) wireless voice applications. All
benchmark kernels are compiled by the compilers of the respective
DSP processors and run on their simulators. Weighted arithmetic
mean of clock cycles and arithmetic mean of code size are used to
compare the performance of five DSP processors.
In addition, we studied how the performance of a processor is
affected by code structure, features of processor architecture and
optimization of compiler. The extensive experimental data gathered,
analyzed, and presented in this paper should be helpful for DSP
processor and compiler designers to meet their specific design goals.
Abstract: The purpose of this research is to develop a security model for voice eavesdropping protection over digital networks. The proposed model provides an encryption scheme and a personal secret key exchange between communicating parties, a so-called voice data transformation system, resulting in a real-privacy conversation. The operation of this system comprises two main steps as follows: The first one is the personal secret key exchange for using the keys in the data encryption process during conversation. The key owner could freely make his/her choice in key selection, so it is recommended that one should exchange a different key for a different conversational party, and record the key for each case into the memory provided in the client device. The next step is to set and record another personal option of encryption, either taking all frames or just partial frames, so-called the figure of 1:M. Using different personal secret keys and different sets of 1:M to different parties without the intervention of the service operator, would result in posing quite a big problem for any eavesdroppers who attempt to discover the key used during the conversation, especially in a short period of time. Thus, it is quite safe and effective to protect the case of voice eavesdropping. The results of the implementation indicate that the system can perform its function accurately as designed. In this regard, the proposed system is suitable for effective use in voice eavesdropping protection over digital networks, without any requirements to change presently existing network systems, mobile phone network and VoIP, for instance.
Abstract: Voice over Internet Protocol (VoIP) application or commonly known as softphone has been developing an increasingly large market in today-s telecommunication world and the trend is expected to continue with the enhancement of additional features. This includes leveraging on the existing presence services, location and contextual information to enable more ubiquitous and seamless communications. In this paper, we discuss the concept of seamless session transfer for real-time application such as VoIP and IPTV, and our prototype implementation of such concept on a selected open source VoIP application. The first part of this paper is about conducting performance evaluation and assessments across some commonly found open source VoIP applications that are Ekiga, Kphone, Linphone and Twinkle so as to identify one of them for implementing our design of seamless session transfer. Subjective testing has been carried out to evaluate the audio performance on these VoIP applications and rank them according to their Mean Opinion Score (MOS) results. The second part of this paper is to discuss on the performance evaluations of our prototype implementation of session transfer using Linphone.
Abstract: Biometrics methods include recognition techniques
such as fingerprint, iris, hand geometry, voice, face, ears and gait. The gait recognition approach has some advantages, for example it
does not need the prior concern of the observed subject and it can
record many biometric features in order to make deeper analysis, but
most of the research proposals use high computational cost. This
paper shows a gait recognition system with feature subtraction on a
bundle rectangle drawn over the observed person. Statistical results
within a database of 500 videos are shown.
Abstract: In the management of industrial waste, conversion from the use of paper invoices to electronic forms is currently under way in developed countries. Difficulties in such computerization include the lack of synchronization between the actual goods and the corresponding data managed by the server. Consequently, a system which utilizes the incorporation of a QR code in connection with the waste material has been developed. The code is read at each stage, from discharge until disposal, and progress at each stage can be easily reported. This system can be linked with Japanese public digital authentication service of waste, taking advantage of its good points, and can be used to submit reports to the regulatory authorities. Its usefulness was confirmed by a verification test, and put into actual practice.
Abstract: Cellular networks provide voice and data services to the users with mobility. To deliver services to the mobile users, the cellular network is capable of tracking the locations of the users, and allowing user movement during the conversations. These capabilities are achieved by the location management. Location management in mobile communication systems is concerned with those network functions necessary to allow the users to be reached wherever they are in the network coverage area. In a cellular network, a service coverage area is divided into smaller areas of hexagonal shape, referred to as cells. The cellular concept was introduced to reuse the radio frequency. Continued expansion of cellular networks, coupled with an increasingly restricted mobile spectrum, has established the reduction of communication overhead as a highly important issue. Much of this traffic is used in determining the precise location of individual users when relaying calls, with the field of location management aiming to reduce this overhead through prediction of user location. This paper describes and compares various location management schemes in the cellular networks.
Abstract: Currently electronic slide (e-slide) is one of the most common styles in educational presentation. Unfortunately, the utilization of e-slide for the visually impaired is uncommon since they are unable to see the content of such e-slides which are usually composed of text, images and animation. This paper proposes a model for presenting e-slide in multimodal presentation i.e. using conventional slide concurrent with voicing, in both languages Malay and English. At the design level, live multimedia presentation concept is used, while at the implementation level several components are used. The text content of each slide is extracted using COM component, Microsoft Speech API for voicing the text in English language and the text in Malay language is voiced using dictionary approach. To support the accessibility, an auditory user interface is provided as an additional feature. A prototype of such model named as VSlide has been developed and introduced.