A High Quality Speech Coder at 600 bps

This paper presents a vocoder to obtain high quality synthetic speech at 600 bps. To reduce the bit rate, the algorithm is based on a sinusoidally excited linear prediction model which extracts few coding parameters, and three consecutive frames are grouped into a superframe and jointly vector quantization is used to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced frame combinations in the superframe. Experimental results show that the quality of the proposed coder is better than that of 2.4kbps LPC10e and achieves approximately the same as that of 2.4kbps MELP and with high robustness.

Voice Command Recognition System Based on MFCC and VQ Algorithms

The goal of this project is to design a system to recognition voice commands. Most of voice recognition systems contain two main modules as follow “feature extraction" and “feature matching". In this project, MFCC algorithm is used to simulate feature extraction module. Using this algorithm, the cepstral coefficients are calculated on mel frequency scale. VQ (vector quantization) method will be used for reduction of amount of data to decrease computation time. In the feature matching stage Euclidean distance is applied as similarity criterion. Because of high accuracy of used algorithms, the accuracy of this voice command system is high. Using these algorithms, by at least 5 times repetition for each command, in a single training session, and then twice in each testing session zero error rate in recognition of commands is achieved.

Transformation of Vocal Characteristics: A Review of Literature

The transformation of vocal characteristics aims at modifying voice such that the intelligibility of aphonic voice is increased or the voice characteristics of a speaker (source speaker) to be perceived as if another speaker (target speaker) had uttered it. In this paper, the current state-of-the-art voice characteristics transformation methodology is reviewed. Special emphasis is placed on voice transformation methodology and issues for improving the transformed speech quality in intelligibility and naturalness are discussed. In particular, it is suggested to use the modulation theory of speech as a base for research on high quality voice transformation. This approach allows one to separate linguistic, expressive, organic and perspective information of speech, based on an analysis of how they are fused when speech is produced. Therefore, this theory provides the fundamentals not only for manipulating non-linguistic, extra-/paralinguistic and intra-linguistic variables for voice transformation, but also for paving the way for easily transposing the existing voice transformation methods to emotion-related voice quality transformation and speaking style transformation. From the perspectives of human speech production and perception, the popular voice transformation techniques are described and classified them based on the underlying principles either from the speech production or perception mechanisms or from both. In addition, the advantages and limitations of voice transformation techniques and the experimental manipulation of vocal cues are discussed through examples from past and present research. Finally, a conclusion and road map are pointed out for more natural voice transformation algorithms in the future.

On Preprocessing of Speech Signals

Preprocessing of speech signals is considered a crucial step in the development of a robust and efficient speech or speaker recognition system. In this paper, we present some popular statistical outlier-detection based strategies to segregate the silence/unvoiced part of the speech signal from the voiced portion. The proposed methods are based on the utilization of the 3 σ edit rule, and the Hampel Identifier which are compared with the conventional techniques: (i) short-time energy (STE) based methods, and (ii) distribution based methods. The results obtained after applying the proposed strategies on some test voice signals are encouraging.

Precision Identification of Nonlinear Damping Parameter for a Miniature Moving-Coil Transducer

The nonlinear damping behavior is usually ignored in the design of a miniature moving-coil loudspeaker. But when the loudspeaker operated in air, the damping parameter varies with the voice-coil displacement corresponding due to viscous air flow. The present paper presents an identification model as inverse problem to identify the nonlinear damping parameter in the lumped parameter model for the loudspeaker. Theoretical results for the nonlinear damping are verified by using laser displacement measurement scanner. These results indicate that the damping parameter has the greatly different nonlinearity between in air and vacuum. It is believed that the results of the present work can be applied in diagnosis and sound quality improvement of a miniature loudspeaker.

A Supervised Text-Independent Speaker Recognition Approach

We provide a supervised speech-independent voice recognition technique in this paper. In the feature extraction stage we propose a mel-cepstral based approach. Our feature vector classification method uses a special nonlinear metric, derived from the Hausdorff distance for sets, and a minimum mean distance classifier.

Speech Activated Automation

This article presents a simple way to perform programmed voice commands for the interface with commercial Digital and Analogue Input/Output PCI cards, used in Robotics and Automation applications. Robots and Automation equipment can "listen" to voice commands and perform several different tasks, approaching to the human behavior, and improving the human- machine interfaces for the Automation Industry. Since most PCI Digital and Analogue Input/Output cards are sold with several DLLs included (for use with different programming languages), it is possible to add speech recognition capability, using a standard speech recognition engine, compatible with the programming languages used. It was created in this work a Visual Basic 6 (the world's most popular language) application, that listens to several voice commands, and is capable to communicate directly with several standard 128 Digital I/O PCI Cards, used to control complete Automation Systems, with up to (number of boards used) x 128 Sensors and/or Actuators.

The CEO Mission II, Rescue Robot with Multi-Joint Mechanical Arm

This paper presents design features of a rescue robot, named CEO Mission II. Its body is designed to be the track wheel type with double front flippers for climbing over the collapse and the rough terrain. With 125 cm. long, 5-joint mechanical arm installed on the robot body, it is deployed not only for surveillance from the top view but also easier and faster access to the victims to get their vital signs. Two cameras and sensors for searching vital signs are set up at the tip of the multi-joint mechanical arm. The third camera is at the back of the robot for driving control. Hardware and software of the system, which controls and monitors the rescue robot, are explained. The control system is used for controlling the robot locomotion, the 5-joint mechanical arm, and for turning on/off devices. The monitoring system gathers all information from 7 distance sensors, IR temperature sensors, 3 CCD cameras, voice sensor, robot wheels encoders, yawn/pitch/roll angle sensors, laser range finder and 8 spare A/D inputs. All sensors and controlling data are communicated with a remote control station via IEEE 802.11b Wi-Fi. The audio and video data are compressed and sent via another IEEE 802.11g Wi-Fi transmitter for getting real-time response. At remote control station site, the robot locomotion and the mechanical arm are controlled by joystick. Moreover, the user-friendly GUI control program is developed based on the clicking and dragging method to easily control the movement of the arm. Robot traveling map is plotted from computing the information of wheel encoders and the yawn/pitch data. 2D Obstacle map is plotted from data of the laser range finder. The concept and design of this robot can be adapted to suit many other applications. As the Best Technique awardee from Thailand Rescue Robot Championship 2006, all testing results are satisfied.

Traffic Behaviour of VoIP in a Simulated Access Network

Insufficient Quality of Service (QoS) of Voice over Internet Protocol (VoIP) is a growing concern that has lead the need for research and study. In this paper we investigate the performance of VoIP and the impact of resource limitations on the performance of Access Networks. The impact of VoIP performance in Access Networks is particularly important in regions where Internet resources are limited and the cost of improving these resources is prohibitive. It is clear that perceived VoIP performance, as measured by mean opinion score [2] in experiments, where subjects are asked to rate communication quality, is determined by end-to-end delay on the communication path, delay variation, packet loss, echo, the coding algorithm in use and noise. These performance indicators can be measured and the affect in the Access Network can be estimated. This paper investigates the congestion in the Access Network to the overall performance of VoIP services with the presence of other substantial uses of internet and ways in which Access Networks can be designed to improve VoIP performance. Methods for analyzing the impact of the Access Network on VoIP performance will be surveyed and reviewed. This paper also considers some approaches for improving performance of VoIP by carrying out experiments using Network Simulator version 2 (NS2) software with a view to gaining a better understanding of the design of Access Networks.

Speech Coding and Recognition

This paper investigates the performance of a speech recognizer in an interactive voice response system for various coded speech signals, coded by using a vector quantization technique namely Multi Switched Split Vector Quantization Technique. The process of recognizing the coded output can be used in Voice banking application. The recognition technique used for the recognition of the coded speech signals is the Hidden Markov Model technique. The spectral distortion performance, computational complexity, and memory requirements of Multi Switched Split Vector Quantization Technique and the performance of the speech recognizer at various bit rates have been computed. From results it is found that the speech recognizer is showing better performance at 24 bits/frame and it is found that the percentage of recognition is being varied from 100% to 93.33% for various bit rates.

Recognition by Online Modeling – a New Approach of Recognizing Voice Signals in Linear Time

This work presents a novel means of extracting fixedlength parameters from voice signals, such that words can be recognized in linear time. The power and the zero crossing rate are first calculated segment by segment from a voice signal; by doing so, two feature sequences are generated. We then construct an FIR system across these two sequences. The parameters of this FIR system, used as the input of a multilayer proceptron recognizer, can be derived by recursive LSE (least-square estimation), implying that the complexity of overall process is linear to the signal size. In the second part of this work, we introduce a weighting factor λ to emphasize recent input; therefore, we can further recognize continuous speech signals. Experiments employ the voice signals of numbers, from zero to nine, spoken in Mandarin Chinese. The proposed method is verified to recognize voice signals efficiently and accurately.

Bandwidth Estimation Algorithms for the Dynamic Adaptation of Voice Codec

In the recent years multimedia traffic and in particular VoIP services are growing dramatically. We present a new algorithm to control the resource utilization and to optimize the voice codec selection during SIP call setup on behalf of the traffic condition estimated on the network path. The most suitable methodologies and the tools that perform realtime evaluation of the available bandwidth on a network path have been integrated with our proposed algorithm: this selects the best codec for a VoIP call in function of the instantaneous available bandwidth on the path. The algorithm does not require any explicit feedback from the network, and this makes it easily deployable over the Internet. We have also performed intensive tests on real network scenarios with a software prototype, verifying the algorithm efficiency with different network topologies and traffic patterns between two SIP PBXs. The promising results obtained during the experimental validation of the algorithm are now the basis for the extension towards a larger set of multimedia services and the integration of our methodology with existing PBX appliances.

Online Collaborative Learning System Using Speech Technology

A Web-based learning tool, the Learn IN Context (LINC) system, designed and being used in some institution-s courses in mixed-mode learning, is presented in this paper. This mode combines face-to-face and distance approaches to education. LINC can achieve both collaborative and competitive learning. In order to provide both learners and tutors with a more natural way to interact with e-learning applications, a conversational interface has been included in LINC. Hence, the components and essential features of LINC+, the voice enhanced version of LINC, are described. We report evaluation experiments of LINC/LINC+ in a real use context of a computer programming course taught at the Université de Moncton (Canada). The findings show that when the learning material is delivered in the form of a collaborative and voice-enabled presentation, the majority of learners seem to be satisfied with this new media, and confirm that it does not negatively affect their cognitive load.

Performance Analysis of Digital Signal Processors Using SMV Benchmark

Unlike general-purpose processors, digital signal processors (DSP processors) are strongly application-dependent. To meet the needs for diverse applications, a wide variety of DSP processors based on different architectures ranging from the traditional to VLIW have been introduced to the market over the years. The functionality, performance, and cost of these processors vary over a wide range. In order to select a processor that meets the design criteria for an application, processor performance is usually the major concern for digital signal processing (DSP) application developers. Performance data are also essential for the designers of DSP processors to improve their design. Consequently, several DSP performance benchmarks have been proposed over the past decade or so. However, none of these benchmarks seem to have included recent new DSP applications. In this paper, we use a new benchmark that we recently developed to compare the performance of popular DSP processors from Texas Instruments and StarCore. The new benchmark is based on the Selectable Mode Vocoder (SMV), a speech-coding program from the recent third generation (3G) wireless voice applications. All benchmark kernels are compiled by the compilers of the respective DSP processors and run on their simulators. Weighted arithmetic mean of clock cycles and arithmetic mean of code size are used to compare the performance of five DSP processors. In addition, we studied how the performance of a processor is affected by code structure, features of processor architecture and optimization of compiler. The extensive experimental data gathered, analyzed, and presented in this paper should be helpful for DSP processor and compiler designers to meet their specific design goals.

A Security Model of Voice Eavesdropping Protection over Digital Networks

The purpose of this research is to develop a security model for voice eavesdropping protection over digital networks. The proposed model provides an encryption scheme and a personal secret key exchange between communicating parties, a so-called voice data transformation system, resulting in a real-privacy conversation. The operation of this system comprises two main steps as follows: The first one is the personal secret key exchange for using the keys in the data encryption process during conversation. The key owner could freely make his/her choice in key selection, so it is recommended that one should exchange a different key for a different conversational party, and record the key for each case into the memory provided in the client device. The next step is to set and record another personal option of encryption, either taking all frames or just partial frames, so-called the figure of 1:M. Using different personal secret keys and different sets of 1:M to different parties without the intervention of the service operator, would result in posing quite a big problem for any eavesdroppers who attempt to discover the key used during the conversation, especially in a short period of time. Thus, it is quite safe and effective to protect the case of voice eavesdropping. The results of the implementation indicate that the system can perform its function accurately as designed. In this regard, the proposed system is suitable for effective use in voice eavesdropping protection over digital networks, without any requirements to change presently existing network systems, mobile phone network and VoIP, for instance.

Performance Study on Audio Codec and Session Transfer of Open Source VoIP applications

Voice over Internet Protocol (VoIP) application or commonly known as softphone has been developing an increasingly large market in today-s telecommunication world and the trend is expected to continue with the enhancement of additional features. This includes leveraging on the existing presence services, location and contextual information to enable more ubiquitous and seamless communications. In this paper, we discuss the concept of seamless session transfer for real-time application such as VoIP and IPTV, and our prototype implementation of such concept on a selected open source VoIP application. The first part of this paper is about conducting performance evaluation and assessments across some commonly found open source VoIP applications that are Ekiga, Kphone, Linphone and Twinkle so as to identify one of them for implementing our design of seamless session transfer. Subjective testing has been carried out to evaluate the audio performance on these VoIP applications and rank them according to their Mean Opinion Score (MOS) results. The second part of this paper is to discuss on the performance evaluations of our prototype implementation of session transfer using Linphone.

Gait Recognition System: Bundle Rectangle Approach

Biometrics methods include recognition techniques such as fingerprint, iris, hand geometry, voice, face, ears and gait. The gait recognition approach has some advantages, for example it does not need the prior concern of the observed subject and it can record many biometric features in order to make deeper analysis, but most of the research proposals use high computational cost. This paper shows a gait recognition system with feature subtraction on a bundle rectangle drawn over the observed person. Statistical results within a database of 500 videos are shown.

Review of a Real-Time Infectious Waste Management System Using QR Code

In the management of industrial waste, conversion from the use of paper invoices to electronic forms is currently under way in developed countries. Difficulties in such computerization include the lack of synchronization between the actual goods and the corresponding data managed by the server. Consequently, a system which utilizes the incorporation of a QR code in connection with the waste material has been developed. The code is read at each stage, from discharge until disposal, and progress at each stage can be easily reported. This system can be linked with Japanese public digital authentication service of waste, taking advantage of its good points, and can be used to submit reports to the regulatory authorities. Its usefulness was confirmed by a verification test, and put into actual practice.

Location Management in Cellular Networks

Cellular networks provide voice and data services to the users with mobility. To deliver services to the mobile users, the cellular network is capable of tracking the locations of the users, and allowing user movement during the conversations. These capabilities are achieved by the location management. Location management in mobile communication systems is concerned with those network functions necessary to allow the users to be reached wherever they are in the network coverage area. In a cellular network, a service coverage area is divided into smaller areas of hexagonal shape, referred to as cells. The cellular concept was introduced to reuse the radio frequency. Continued expansion of cellular networks, coupled with an increasingly restricted mobile spectrum, has established the reduction of communication overhead as a highly important issue. Much of this traffic is used in determining the precise location of individual users when relaying calls, with the field of location management aiming to reduce this overhead through prediction of user location. This paper describes and compares various location management schemes in the cellular networks.

Development of Multimodal e-Slide Presentation to Support Self-Learning for the Visually Impaired

Currently electronic slide (e-slide) is one of the most common styles in educational presentation. Unfortunately, the utilization of e-slide for the visually impaired is uncommon since they are unable to see the content of such e-slides which are usually composed of text, images and animation. This paper proposes a model for presenting e-slide in multimodal presentation i.e. using conventional slide concurrent with voicing, in both languages Malay and English. At the design level, live multimedia presentation concept is used, while at the implementation level several components are used. The text content of each slide is extracted using COM component, Microsoft Speech API for voicing the text in English language and the text in Malay language is voiced using dictionary approach. To support the accessibility, an auditory user interface is provided as an additional feature. A prototype of such model named as VSlide has been developed and introduced.