Abstract: Music is ubiquitous in human lives. Ever since the foetus hears the sound inside the mother’s womb and later upon birth the baby experiences alluring sounds, the curiosity of learning emanates and evokes exploration. Music is an education than a mere entertainment. The intricate balance between music, education and entertainment has well been recognized by the scientific community and is being explored as a viable tool to understand and improve the human cognition. There are seven basic swaras (notes) Sa, Ri, Ga, Ma, Pa, Da and Ni in the Carnatic music system that are analogous to C, D, E, F, G, A and B of the western system. The Carnatic music builds on the conscious use of microtones, gamakams (oscillation) and rendering styles that evolved over centuries and established its stance. The complex but erudite raga system has been designed with elaborate experiments on srutis (musical sounds) and human perception abilities. In parallel, ‘rasa’- the emotions evoked by certain srutis and hence the ragas been solidified along with the power of language in combination with the musical sounds. The Carnatic music branches out as Kalpita sangeetam (pre-composed music) and Manodharma sangeetam (improvised music). This article explores the Manodharma sangeetam and its subdivisions such as raga alapana, swara kalpana, neraval and ragam-tanam-pallavi (RTP). The intrinsic mathematical strategies in its practice methods toward improvising the music have been discussed in detail with concert examples. The techniques on swara weaving for swara kalpana rendering and methods on the alapana development are also discussed at length with an emphasis on the impact on the human cognitive abilities. The articulation of the outlined conscious practice methods not only helps to leave a long-lasting melodic impression on the listeners but also onsets cognitive developments.
Abstract: The aim of this experimental and numerical study is to analyze the effects of acoustic streaming generated by 40 kHz ultrasonic waves on heat transfer in forced convection, with and without 40 PPI aluminum metal foam. Preliminary dynamic and thermal studies were done with COMSOL Multiphase, to see heat transfer enhancement degree by inserting a 40PPI metal foam (10 × 2 × 3 cm) on a heat sink, after having determined experimentally its permeability and Forchheimer's coefficient. The results obtained numerically are in accordance with those obtained experimentally, with an enhancement factor of 205% for a velocity of 0.4 m/s compared to an empty channel. The influence of 40 kHz ultrasound on heat transfer was also tested with and without metallic foam. Results show a remarkable increase in Nusselt number in an empty channel with an enhancement factor of 37,5%, while no influence of ultrasound on heat transfer in metal foam presence.
Abstract: Raga, as the soul and base, is a distinctive musical entity, in the music system, with unique structure on its construction of srutis (musical sounds) and application. One of the essential components of the music system is the ‘tala’ that defines the rhythm of a song. There are seven basic swaras (notes) Sa, Ri, Ga, Ma, Pa, Da and Ni in the carnatic music system that are analogous to the C, D, E, F, G, A and B of the western system. The carnatic music further builds on conscious use of microtones, gamakams (oscillation) and rendering styles. It has basic 72 ragas known as melakarta ragas, and a plethora of ragas have been developed from them with permutations and combinations of the basic swaras. Among them, some ragas derived from a same melakarta raga are distinctly different from each other and could evoke a profound difference in the raga bhava (emotion) during rendering. Although these could bear similar arohana and avarohana swaras, their quintessential differences in the gamakas usage and srutis present therein offer varied melodic feelings; variations in the intonation and stress given to certain swara phrases are the root causes. This article enlightens a group of such allied ragas (AR) from the perspectives of their schema and raga alapana (improvisation), ranjaka prayogas (signature phrases), differences in rendering tempo, gamakas and delicate srutis along with the range of sancharas (musical phrases). The intricate differences on the sruti frequencies and use of AR in composing kritis (musical compositions) toward emotive accomplishments such as mood of valor, kindness, love, humor, anger, mercy to name few, have also been explored. A brief review on the existing scientific research on the music therapy on some of the Carnatic ragas is presented. Studying and comprehending the AR, indeed, enable the music aspirants to gain a thorough knowledge on the subtle nuances among the ragas. Such knowledge helps leave a long-lasting melodic impression on the listeners and enable further research on the music therapy.
Abstract: Certain systems can function well only if they recognize the sound environment as humans do. In this research, we focus on sound classification by adopting a convolutional neural network and aim to develop a method that automatically classifies various environmental sounds. Although the neural network is a powerful technique, the performance depends on the type of input data. Therefore, we propose an approach via a slice bispectrogram, which is a third-order spectrogram and is a slice version of the amplitude for the short-time bispectrum. This paper explains the slice bispectrogram and discusses the effectiveness of the derived method by evaluating the experimental results using the ESC‑50 sound dataset. As a result, the proposed scheme gives high accuracy and stability. Furthermore, some relationship between the accuracy and non-Gaussianity of sound signals was confirmed.
Abstract: The separation of speech signals has become a research
hotspot in the field of signal processing in recent years. It has
many applications and influences in teleconferencing, hearing aids,
speech recognition of machines and so on. The sounds received are
usually noisy. The issue of identifying the sounds of interest and
obtaining clear sounds in such an environment becomes a problem
worth exploring, that is, the problem of blind source separation.
This paper focuses on the under-determined blind source separation
(UBSS). Sparse component analysis is generally used for the problem
of under-determined blind source separation. The method is mainly
divided into two parts. Firstly, the clustering algorithm is used to
estimate the mixing matrix according to the observed signals. Then
the signal is separated based on the known mixing matrix. In this
paper, the problem of mixing matrix estimation is studied. This paper
proposes an improved algorithm to estimate the mixing matrix for
speech signals in the UBSS model. The traditional potential algorithm
is not accurate for the mixing matrix estimation, especially for low
signal-to noise ratio (SNR).In response to this problem, this paper
considers the idea of an improved potential function method to
estimate the mixing matrix. The algorithm not only avoids the inuence
of insufficient prior information in traditional clustering algorithm,
but also improves the estimation accuracy of mixing matrix. This
paper takes the mixing of four speech signals into two channels as
an example. The results of simulations show that the approach in this
paper not only improves the accuracy of estimation, but also applies
to any mixing matrix.
Abstract: This study investigates the perceptual features of Japanese obstruent geminates among Chinese learners of Japanese, focusing on the dialectal effect of the checked-tone, a syllable that ends in a stop consonant or a glottal stop, which is similar to Japanese obstruent geminates phonetically. In this study, 41 native speakers of Cantonese are divided into two groups based on their proficiency as well as learning period of Japanese. All stimuli employed in this study are made into C[p,k,s]+V[a,e,i] structure such as /apa/, /eke/, /isi/. Both original sounds and synthesized sounds are used in three different parts of this study. The results of the present study show that the checked-tone does have the positive effect on the perception of Japanese gemination. Furthermore, the proportion of closure duration in the entire word would be a more reliable and appropriate criterion in testing this kind of task.
Abstract: Cardiologists perform cardiac auscultation to detect
abnormalities in heart sounds. Since accurate auscultation is
a crucial first step in screening patients with heart diseases,
there is a need to develop computer-aided detection/diagnosis
(CAD) systems to assist cardiologists in interpreting heart sounds
and provide second opinions. In this paper different algorithms
are implemented for automated heart sound classification using
unsegmented phonocardiogram (PCG) signals. Support vector
machine (SVM), artificial neural network (ANN) and cartesian
genetic programming evolved artificial neural network (CGPANN)
without the application of any segmentation algorithm has been
explored in this study. The signals are first pre-processed to remove
any unwanted frequencies. Both time and frequency domain features
are then extracted for training the different models. The different
algorithms are tested in multiple scenarios and their strengths and
weaknesses are discussed. Results indicate that SVM outperforms
the rest with an accuracy of 73.64%.
Abstract: Fresnel Zone Plates (FZPs) are widely used in many areas, such as optics, microwaves or acoustics. On the design of FZPs, plane wave incidence is typically considered, but that is not usually the case in ultrasounds, especially in applications where a piston emitter is placed at a certain distance from the lens. In these cases, having control of the focal distance is very important, and with the usual Fresnel equation a focal displacement from the theoretical distance is observed due to the plane wave supposition. In this work, a comparison between FZP with plane wave incidence design and FZP with point source design in the case of piston emitter is presented. Influence of the main parameters of the piston in the final focalization profile has been studied. Numerical models and experimental results are shown, and they prove that when spherical wave incidence is considered for the piston case, it is possible to have a fine control of the focal distance in comparison with the classical design method.
Abstract: Phonological disorder is a serious and disturbing issue to many parents and teachers. Efforts towards resolving the problem have been undermined by other specific disabilities which were hidden to many regular and special education teachers. It is against this background that this study was motivated to provide data on the prevalence of phonological disorders in children with specific language impairment (CWSLI) as the first step towards critical intervention. The study was a survey of 15 CWSLI from St. Louise Inclusive schools, Ikot Ekpene in Akwa Ibom State of Nigeria. Phonological Processes Diagnostic Scale (PPDS) with 17 short sentences, which cut across the five phonological processes that were examined, were validated by experts in test measurement, phonology and special education. The respondents were made to read the sentences with emphasis on the targeted sounds. Their utterances were recorded and analyzed in the language laboratory using Praat Software. Data were also collected through friendly interactions at different times from the clients. The theory of generative phonology was adopted for the descriptive analysis of the phonological processes. Data collected were analyzed using simple percentage and composite bar chart for better understanding of the result. The study found out that CWSLI exhibited the five phonological processes under investigation. It was revealed that 66.7%, 80%, 73.3%, 80%, and 86.7% of the respondents have severe deficit in fricative stopping, velar fronting, liquid gliding, final consonant deletion and cluster reduction, respectively. It was therefore recommended that a nationwide survey should be carried out to have national statistics of CWSLI with phonological deficits and develop intervention strategies for effective therapy to remediate the disorder.
Abstract: The psychological present has an actual extension.
When a sequence of instantaneous stimuli falls in this short interval
of time, observers perceive a compresence of events in succession
and the temporal order depends on the qualitative relationships
between the perceptual properties of the events. Two experiments
were carried out to study the influence of perceptual grouping, with
and without temporal displacement, on the duration of auditory
sequences. The psychophysical method of adjustment was adopted.
The first experiment investigated the effect of temporal displacement
of a white noise on sequence duration. The second experiment
investigated the effect of temporal displacement, along the pitch
dimension, on temporal shortening of sequence. The results suggest
that the temporal order of sounds, in the case of temporal
displacement, is organized along the pitch dimension.
Abstract: This paper presents an automatic normal and abnormal heart sound classification model developed based on deep learning algorithm. MITHSDB heart sounds datasets obtained from the 2016 PhysioNet/Computing in Cardiology Challenge database were used in this research with the assumption that the electrocardiograms (ECG) were recorded simultaneously with the heart sounds (phonocardiogram, PCG). The PCG time series are segmented per heart beat, and each sub-segment is converted to form a square intensity matrix, and classified using convolutional neural network (CNN) models. This approach removes the need to provide classification features for the supervised machine learning algorithm. Instead, the features are determined automatically through training, from the time series provided. The result proves that the prediction model is able to provide reasonable and comparable classification accuracy despite simple implementation. This approach can be used for real-time classification of heart sounds in Internet of Medical Things (IoMT), e.g. remote monitoring applications of PCG signal.
Abstract: In this research, a quantitative assessment of the urban sound environment of the city of Biskra, Algeria, was conducted. To determine the quality of the soundscape based on in-situ measurement, using a Landtek SL5868P sound level meter in 47 points, which have been identified to represent the whole city. The result shows that the urban noise level varies from 55.3 dB to 75.8 dB during the weekdays and from 51.7 dB to 74.3 dB during the weekend. On the other hand, we can also note that 70.20% of the results of the weekday measurements and 55.30% of the results of the weekend measurements have levels of sound intensity that exceed the levels allowed by Algerian law and the recommendations of the World Health Organization. These very high urban noise levels affect the quality of life, the acoustic comfort and may even pose multiple risks to people's health.
Abstract: Recognizing and controlling vocal registers during
singing is a difficult task for beginner vocalist. It requires among
others identifying which part of natural resonators is being used
when a sound propagates through the body. Thus, an application
has been designed allowing for sound recording, automatic vocal
register recognition (VRR), and a graphical user interface providing
real-time visualization of the signal and recognition results. Six
spectral features are determined for each time frame and passed to the
support vector machine classifier yielding a binary decision on the
head or chest register assignment of the segment. The classification
training and testing data have been recorded by ten professional
female singers (soprano, aged 19-29) performing sounds for both
chest and head register. The classification accuracy exceeded 93%
in each of various validation schemes. Apart from a hard two-class
clustering, the support vector classifier returns also information on
the distance between particular feature vector and the discrimination
hyperplane in a feature space. Such an information reflects the level
of certainty of the vocal register classification in a fuzzy way. Thus,
the designed recognition and training application is able to assess and
visualize the continuous trend in singing in a user-friendly graphical
mode providing an easy way to control the vocal emission.
Abstract: New sensors and technologies – such as microphones,
touchscreens or infrared sensors – are currently making their
appearance in the automotive sector, introducing new kinds of
Human-Machine Interfaces (HMIs). The interactions with such tools
might be cognitively expensive, thus unsuitable for driving tasks.
It could for instance be dangerous to use touchscreens with a
visual feedback while driving, as it distracts the driver’s visual
attention away from the road. Furthermore, new technologies in
car cockpits modify the interactions of the users with the central
system. In particular, touchscreens are preferred to arrays of buttons
for space improvement and design purposes. However, the buttons’
tactile feedback is no more available to the driver, which makes
such interfaces more difficult to manipulate while driving. Gestures
combined with an auditory feedback might therefore constitute an
interesting alternative to interact with the HMI. Indeed, gestures can
be performed without vision, which means that the driver’s visual
attention can be totally dedicated to the driving task. In fact, the
auditory feedback can both inform the driver with respect to the task
performed on the interface and on the performed gesture, which might
constitute a possible solution to the lack of tactile information. As
audition is a relatively unused sense in automotive contexts, gesture
sonification can contribute to reducing the cognitive load thanks
to the proposed multisensory exploitation. Our approach consists
in using a virtual object (VO) to sonify the consequences of the
gesture rather than the gesture itself. This approach is motivated
by an ecological point of view: Gestures do not make sound, but
their consequences do. In this experiment, the aim was to identify
efficient sound strategies, to transmit dynamic information of VOs to
users through sound. The swipe gesture was chosen for this purpose,
as it is commonly used in current and new interfaces. We chose
two VO parameters to sonify, the hand-VO distance and the VO
velocity. Two kinds of sound parameters can be chosen to sonify the
VO behavior: Spectral or temporal parameters. Pitch and brightness
were tested as spectral parameters, and amplitude modulation as a
temporal parameter. Performances showed a positive effect of sound
compared to a no-sound situation, revealing the usefulness of sounds
to accomplish the task.
Abstract: Composite materials are one answer to the growing demand for materials with better parameters of construction and exploitation. Composite materials also permit conscious shaping of desirable properties to increase the extent of reach in the case of metals, ceramics or polymers. In recent years, composite materials have been used widely in aerospace, energy, transportation, medicine, etc. Fiber-reinforced composites including carbon fiber, glass fiber and aramid fiber have become a major structural material. The typical defect during manufacture and operation is delamination damage of layered composites. When delamination damage of the composites spreads, it may lead to a composite fracture. One of the many methods used in non-destructive testing of composites is active infrared thermography. In active thermography, it is necessary to deliver energy to the examined sample in order to obtain significant temperature differences indicating the presence of subsurface anomalies. To detect possible defects in composite materials, different methods of thermal stimulation can be applied to the tested material, these include heating lamps, lasers, eddy currents, microwaves or ultrasounds. The use of a suitable source of thermal stimulation on the test material can have a decisive influence on the detection or failure to detect defects. Samples of multilayer structure carbon composites were prepared with deliberately introduced defects for comparative purposes. Very thin defects of different sizes and shapes made of Teflon or copper having a thickness of 0.1 mm were screened. Non-destructive testing was carried out using the following sources of thermal stimulation, heating lamp, flash lamp, ultrasound and eddy currents. The results are reported in the paper.
Abstract: Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.
Abstract: The traditional rhythms of the West African country
of Guinea have played a centuries-long role in defining the different
people groups that make up the country. Throughout their history,
before and since colonization by the French, the different ethnicities
have used their traditional music as a distinct part of their historical
identities. That is starting to change. Guinea is an impoverished
nation created in the early twentieth-century with little regard for the
history and cultures of the people who were included. The traditional
rhythms of the different people groups and their heritages have
remained. Fifteen individual traditional Guinean rhythms were
chosen to represent popular rhythms from the four geographical
regions of Guinea. Each rhythm was traced back to its native village
and video recorded on-site by as many different local performing
groups as could be located. The cyclical patterns rhythms were
transcribed via a circular, spatial design and then copied into a box
notation system where sounds happening at the same time could be
studied. These rhythms were analyzed for their consistency-overperformance
in a Fundamental Rhythm Pattern analysis so rhythms
could be compared for how they are changing through different
performances. The analysis showed that the traditional rhythm
performances of the Middle and Forest Guinea regions were the most
cohesive and showed the least evidence of change between
performances. The role of music in each of these regions is both
limited and focused. The Coastal and High Guinea regions have
much in common historically through their ethnic history and
modern-day trade connections, but the rhythm performances seem to
be less consistent and demonstrate more changes in how they are
performed today. In each of these regions the role and usage of music
is much freer and wide-spread. In spite of advances being made as a
country, different ethnic groups still frequently only respond and
participate (dance and sing) to the music of their native ethnicity.
There is some evidence that this self-imposed musical barrier is
beginning to change and evolve, partially through the development of
better roads, more access to electricity and technology, the nationwide
Ebola health crisis, and a growing self-identification as a
unified nation.
Abstract: A blood pressure monitor or sphygmomanometer can
be either manual or automatic, employing respectively either the
auscultatory method or the oscillometric method.
The manual version of the sphygmomanometer involves an
inflatable cuff with a stethoscope adopted to detect the sounds
generated by the arterial walls to measure blood pressure in an artery.
An automatic sphygmomanometer can be effectively used to
monitor blood pressure through a pressure sensor, which detects
vibrations provoked by oscillations of the arterial walls.
The pressure sensor implemented in this device improves the
accuracy of the measurements taken.
Abstract: This action research accentuates the outcome of a development in English pronunciation, using principles of phonetics for English major students at Loei Rajabhat University. The research is split into 5 separate modules: 1) Organs of Speech and How to Produce Sounds, 2) Monopthongs, 3) Diphthongs, 4) Consonant sounds, and 5) Suprasegmental Features. Each module followed a 4 step action research process, 1) Planning, 2) Acting, 3) Observing, and 4) Reflecting. The research targeted 2nd year students who were majoring in English Education at Loei Rajabhat University during the academic year of 2011. A mixed methodology employing both quantitative and qualitative research was used, which put theory into action, taking segmental features up to suprasegmental features. Multiple tools were employed which included the following documents: pre-test and post-test papers, evaluation and assessment papers, group work assessment forms, a presentation grading form, an observation of participants form and a participant self-reflection form.
All 5 modules for the target group showed that results from the post-tests were higher than those of the pre-tests, with 0.01 statistical significance. All target groups attained results ranging from low to moderate and from moderate to high performance. The participants who attained low to moderate results had to re-sit the second round. During the first development stage, participants attended classes with group participation, in which they addressed planning through mutual co-operation and sharing of responsibility. Analytic induction of strong points for this operation illustrated that learner cognition, comprehension, application, and group practices were all present whereas the participants with weak results could be attributed to biological differences, differences in life and learning, or individual differences in responsiveness and self-discipline.
Participants who were required to be re-treated in Spiral 2 received the same treatment again. Results of tests from the 5 modules after the 2nd treatment were that the participants attained higher scores than those attained in the pre-test. Their assessment and development stages also showed improved results. They showed greater confidence at participating in activities, produced higher quality work, and correctly followed instructions for each activity. Analytic induction of strong and weak points for this operation remains the same as for Spiral 1, though there were improvements to problems which existed prior to undertaking the second treatment.
Abstract: We investigate sonic cues for binaural sound localization within classrooms and present a structural model for the same. Two of the primary cues for localization, interaural time difference (ITD) and interaural level difference (ILD) created between the two ears by sounds from a particular point in space, are used. Although these cues do not lend any information about the elevation of a sound source, the torso, head, and outer ear carry out elevation dependent spectral filtering of sounds before they reach the inner ear. This effect is commonly captured in head related transfer function (HRTF) which aids in resolving the ambiguity from the ITDs and ILDs alone and helps localize sounds in free space. The proposed structural model of HRTF produces well controlled horizontal as well as vertical effects. The implemented HRTF is a signal processing model which tries to mimic the physical effects of the sounds interacting with different parts of the body. The effectiveness of the method is tested by synthesizing spatial audio, in MATLAB, for use in listening tests with human subjects and is found to yield satisfactory results in comparison with existing models.