Abstract: The most common domestic birds live in Turkey are: crows (Corvus corone), pigeons (Columba livia), sparrows (Passer domesticus), starlings (Sturnus vulgaris) and blackbirds (Turdus merula). These birds give damage to the agricultural areas and make dirty the human life areas. In order to send away these birds, some different materials and methods such as chemicals, treatments, colored lights, flash and audible scarers are used. It is possible to see many studies about chemical methods in the literatures. However there is not enough works regarding audible bird scarers are reported in the literature. Therefore, a solar powered bird scarer was designed, manufactured and tested in this experimental investigation. Firstly, to understand the sensitive level of these domestic birds against to the audible scarer, many series preliminary studies were conducted. These studies showed that crows are the most resistant against to the audible bird scarer when compared with pigeons, sparrows, starlings and blackbirds. Therefore the solar powered audible bird scarer was tested on crows. The scarer was tested about one month during April- May, 2007. 18 different common known predators- sounds (voices or calls) of domestic birds from Falcon (Falco eleonorae), Falcon (Buteo lagopus), Eagle (Aquila chrysaetos), Montagu-s harrier (Circus pygargus) and Owl (Glaucidium passerinum) were selected for test of the scarer. It was seen from the results that the reaction of the birds was changed depending on the predators- sound type, camouflage of the scarer, sound quality and volume, loudspeaker play and pause periods in one application. In addition, it was also seen that the sound from Falcon (Buteo lagopus) was most effective on crows and the scarer was enough efficient.
Abstract: Generally, in order to create 3D sound using binaural
systems, we use head related transfer functions (HRTF) including the
information of sounds which is arrived to our ears. But it can decline
some three-dimensional effects in the area of a cone of confusion
between front and back directions, because of the characteristics of
HRTF.
In this paper, we propose a new method to use psychoacoustics
theory that reduces the confusion of sound image localization. In the
method, HRTF spectrum characteristic is enhanced by using the
energy ratio of the bark band. Informal listening tests show that the
proposed method improves the front-back sound localization
characteristics much better than the conventional methods
Abstract: This paper presents a new strategy of identification
and classification of pathological voices using the hybrid method
based on wavelet transform and neural networks. After speech
acquisition from a patient, the speech signal is analysed in order to
extract the acoustic parameters such as the pitch, the formants, Jitter,
and shimmer. Obtained results will be compared to those normal and
standard values thanks to a programmable database. Sounds are
collected from normal people and patients, and then classified into
two different categories. Speech data base is consists of several
pathological and normal voices collected from the national hospital
“Rabta-Tunis". Speech processing algorithm is conducted in a
supervised mode for discrimination of normal and pathology voices
and then for classification between neural and vocal pathologies
(Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation
results will be presented in function of the disease and will be
compared with the clinical diagnosis in order to have an objective
evaluation of the developed tool.
Abstract: ''Cocktail party problem'' is well known as one of the human auditory abilities. We can recognize the specific sound that we want to listen by this ability even if a lot of undesirable sounds or noises are mixed. Blind source separation (BSS) based on independent component analysis (ICA) is one of the methods by which we can separate only a special signal from their mixed signals with simple hypothesis. In this paper, we propose an online approach for blind source separation using the sliding DFT and the time domain independent component analysis. The proposed method can reduce calculation complexity in comparison with conventional methods, and can be applied to parallel processing by using digital signal processors (DSPs) and so on. We evaluate this method and show its availability.
Abstract: Bangla Vowel characterization determines the spectral properties of Bangla vowels for efficient synthesis as well as recognition of Bangla vowels. In this paper, Bangla vowels in isolated word have been analyzed based on speech production model within the framework of Analysis-by-Synthesis. This has led to the extraction of spectral parameters for the production model in order to produce different Bangla vowel sounds. The real and synthetic spectra are compared and a weighted square error has been computed along with the error in the formant bandwidths for efficient representation of Bangla vowels. The extracted features produced good representation of targeted Bangla vowel. Such a representation also plays essential role in low bit rate speech coding and vocoders.
Abstract: Discrimination between different classes of environmental
sounds is the goal of our work. The use of a sound recognition
system can offer concrete potentialities for surveillance and
security applications. The first paper contribution to this research
field is represented by a thorough investigation of the applicability
of state-of-the-art audio features in the domain of environmental
sound recognition. Additionally, a set of novel features obtained by
combining the basic parameters is introduced. The quality of the
features investigated is evaluated by a HMM-based classifier to which
a great interest was done. In fact, we propose to use a Multi-Style
training system based on HMMs: one recognizer is trained on a
database including different levels of background noises and is used
as a universal recognizer for every environment. In order to enhance
the system robustness by reducing the environmental variability, we
explore different adaptation algorithms including Maximum Likelihood
Linear Regression (MLLR), Maximum A Posteriori (MAP)
and the MAP/MLLR algorithm that combines MAP and MLLR.
Experimental evaluation shows that a rather good recognition rate
can be reached, even under important noise degradation conditions
when the system is fed by the convenient set of features.
Abstract: One astonishing capability of humans is to recognize thousands of different objects visually, and to learn the semantic association between those objects and words referring to them. This work is an attempt to build a computational model of such capacity,simulating the process by which infants learn how to recognize objects and words through exposure to visual stimuli and vocal sounds.One of the main fact shaping the brain of a newborn is that lights and colors come from entities of the world. Gradually the visual system learn which light sensations belong to same entities, despite large changes in appearance. This experience is common between humans and several other mammals, like non-human primates. But humans only can recognize a huge variety of objects, most manufactured by himself, and make use of sounds to identify and categorize them. The aim of this model is to reproduce these processes in a biologically plausible way, by reconstructing the essential hierarchy of cortical circuits on the visual and auditory neural paths.
Abstract: The acoustic and articulatory properties of fricative speech sounds are being studied using magnetic resonance imaging (MRI) and acoustic recordings from a single subject. Area functions were derived from a complete set of axial and coronal MR slices using two different methods: the Mermelstein technique and the Blum transform. Area functions derived from the two techniques were shown to differ significantly in some cases. Such differences will lead to different acoustic predictions and it is important to know which is the more accurate. The vocal tract acoustic transfer function (VTTF) was derived from these area functions for each fricative and compared with measured speech signals for the same fricative and same subject. The VTTFs for /f/ in two vowel contexts and the corresponding acoustic spectra are derived here; the Blum transform appears to show a better match between prediction and measurement than the Mermelstein technique.
Abstract: This paper proposes an architectural and graphical
user interface (GUI) design of a traditional Thai musical instrument
application for tablet computers for practicing “Ranaad Ek" which is
a trough-resonated keyboard percussion instrument. The application
provides percussion methods for a player as real as a physical
instrument. The application consists of two playing modes. The first
mode is free playing, a player can freely multi touches on wooden bar
to produce instrument sounds. The second mode is practicing mode
that guilds the player to follow percussions and rhythms of practice
songs. The application has achieved requirements and specifications.
Abstract: Heart sound is an acoustic signal and many techniques
used nowadays for human recognition tasks borrow speech recognition
techniques. One popular choice for feature extraction of accoustic
signals is the Mel Frequency Cepstral Coefficients (MFCC) which
maps the signal onto a non-linear Mel-Scale that mimics the human
hearing. However the Mel-Scale is almost linear in the frequency
region of heart sounds and thus should produce similar results with
the standard cepstral coefficients (CC). In this paper, MFCC is
investigated to see if it produces superior results for PCG based
human identification system compared to CC. Results show that the
MFCC system is still superior to CC despite linear filter-banks in
the lower frequency range, giving up to 95% correct recognition rate
for MFCC and 90% for CC. Further experiments show that the high
recognition rate is due to the implementation of filter-banks and not
from Mel-Scaling.
Abstract: Detection of human emotions has many potential applications. One of application is to quantify attentiveness audience in order evaluate acoustic quality in concern hall. The subjective audio preference that based on from audience is used. To obtain fairness evaluation of acoustic quality, the research proposed system for multimodal emotion detection; one modality based on brain signals that measured using electroencephalogram (EEG) and the second modality is sequences of facial images. In the experiment, an audio signal was customized which consist of normal and disorder sounds. Furthermore, an audio signal was played in order to stimulate positive/negative emotion feedback of volunteers. EEG signal from temporal lobes, i.e. T3 and T4 was used to measured brain response and sequence of facial image was used to monitoring facial expression during volunteer hearing audio signal. On EEG signal, feature was extracted from change information in brain wave, particularly in alpha and beta wave. Feature of facial expression was extracted based on analysis of motion images. We implement an advance optical flow method to detect the most active facial muscle form normal to other emotion expression that represented in vector flow maps. The reduce problem on detection of emotion state, vector flow maps are transformed into compass mapping that represents major directions and velocities of facial movement. The results showed that the power of beta wave is increasing when disorder sound stimulation was given, however for each volunteer was giving different emotion feedback. Based on features derived from facial face images, an optical flow compass mapping was promising to use as additional information to make decision about emotion feedback.
Abstract: In this paper, the main principles of text-to-speech synthesis system are presented. Associated problems which arise when developing speech synthesis system are described. Used approaches and their application in the speech synthesis systems for Azerbaijani language are shown.
Abstract: In this paper we propose a robust environmental sound classification approach, based on spectrograms features driven from log-Gabor filters. This approach includes two methods. In the first methods, the spectrograms are passed through an appropriate log-Gabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criteria. The second method uses the same steps but applied only to three patches extracted from each spectrogram.
To investigate the accuracy of the proposed methods, we conduct experiments using a large database containing 10 environmental sound classes. The classification results based on Multiclass Support Vector Machines show that the second method is the most efficient with an average classification accuracy of 89.62 %.
Abstract: This Classifying Bird Sounds (chip notes) project-s
purpose is to reduce the unwanted noise from recorded bird sound
chip notes, design a scheme to detect differences and similarities
between recorded chip notes, and classify bird sound chip notes. The
technologies of determining the similarities of sound waves have
been used in communication, sound engineering and wireless sound
applications for many years. Our research is focused on the similarity
of chip notes, which are the sounds from different birds. The program
we use is generated by Microsoft Cµ.
Abstract: This paper presents the cepstral and trispectral
analysis of a speech signal produced by normal men, men with
defective audition (deaf, deep deaf) and others affected by
tracheotomy, the trispectral analysis based on parametric methods
(Autoregressive AR) using the fourth order cumulant. These
analyses are used to detect and compare the pitches and the formants
of corresponding voiced sounds (vowel \a\, \i\ and \u\). The first
results appear promising, since- it seems after several experimentsthere
is no deformation of the spectrum as one could have supposed
it at the beginning, however these pathologies influenced the two
characteristics:
The defective audition influences to the formants contrary to the
tracheotomy, which influences the fundamental frequency (pitch).
Abstract: The present work faces the problem of automatic enumeration and recognition of an unknown and time-varying number of environmental sound sources while using a single microphone. The assumption that is made is that the sound recorded is a realization of sound sources belonging to a group of audio classes which is known a-priori. We describe two variations of the same principle which is to calculate the distance between the current unknown audio frame and all possible combinations of the classes that are assumed to span the soundscene. We concentrate on categorizing environmental sound sources, such as birds, insects etc. in the task of monitoring the biodiversity of a specific habitat.
Abstract: Various sounds generated in the chest are included in
auscultation sound. Adaptive Noise Canceller (ANC) is one of the
useful techniques for biomedical signal. But the ANC is not suitable
for auscultation sound. Because the ANC needs two input channels as
a primary signal and a reference signals, but a stethoscope can
provide just one input sound. Therefore, in this paper, it was
proposed the Single Input ANC (SIANC) for suppression of breath
sound in a cardiac auscultation sound. For the SIANC, it was
proposed that the reference generation system which included Heart
Sound Detector, Control and Reference Generator. By experiment
and comparison, it was confirmed that the proposed SIANC was
efficient for heart sound enhancement and it was independent of
variations of a heartbeat.
Abstract: Studies of vocal communication in Sooty-headed
Bulbul were carried out from January to December 2011. Vocal
recordings and behavioral observations were made in their natural
habitats at some localities of Lampang, Thailand. After editing, cuts
of high quality recordings were analyzed with the help of Avisoft-
SASLab Pro (version 4.40) software. More than one thousand
element repertoires in five groups were found within two vocal
structures. The two structures were short sounds with single element
and phrases composed of elements, the frequency ranged from 1-10
kHz. Most phrases were composed of 2 to 5 elements that were often
dissimilar in structure, however, these phrases were not as complex
as song phrases. The elements and phrases were combined to form
many patterns. The species used ten types of calls; i.e. alert, alarm,
aggressive, begging, contact, courtship, distress, exciting, flying and
invitation. Alert and contact calls were used more frequently than
other calls. Aggressive, alarm and distress calls could be used for
interspecific communication among some other bird species in the
same habitats.
Abstract: Recent theorizations on the cognitive process of moral
judgment have focused on the role of intuitions and emotions, marking
a departure from previous emphasis on conscious, step-by-step
reasoning. My study investigated how being in a disgusted mood state
affects moral judgment.
Participants were induced to enter a disgusted mood state through
listening to disgusting sounds and reading disgusting descriptions.
Results shows that they, when compared to control who have not been
induced to feel disgust, are more likely to endorse actions that are
emotionally aversive but maximizes utilitarian return
The result is analyzed using the 'emotion-as-information' approach
to decision making. The result is consistent with the view that
emotions play an important role in determining moral judgment.
Abstract: Personal name matching system is the core of
essential task in national citizen database, text and web mining,
information retrieval, online library system, e-commerce and record
linkage system. It has necessitated to the all embracing research in
the vicinity of name matching. Traditional name matching methods
are suitable for English and other Latin based language. Asian
languages which have no word boundary such as Myanmar language
still requires sounds alike matching system in Unicode based
application. Hence we proposed matching algorithm to get analogous
sounds alike (phonetic) pattern that is convenient for Myanmar
character spelling. According to the nature of Myanmar character, we
consider for word boundary fragmentation, collation of character.
Thus we use pattern conversion algorithm which fabricates words in
pattern with fragmented and collated. We create the Myanmar sounds
alike phonetic group to help in the phonetic matching. The
experimental results show that fragmentation accuracy in 99.32% and
processing time in 1.72 ms.