Abstract: This paper investigates the problem of blind speech separation from the speech mixture of two speakers. A voice activity detector employing the Steered Response Power - Phase Transform (SRP-PHAT) is presented for detecting the activity information of speech sources and then the desired speech signals are extracted from the speech mixture by using an optimal beamformer. For evaluation, the algorithm effectiveness, a simulation using real speech recordings had been performed in a double-talk situation where two speakers are active all the time. Evaluations show that the proposed blind speech separation algorithm offers a good interference suppression level whilst maintaining a low distortion level of the desired signal.
Abstract: 84 deaf students (from primary school to college) and their families participated in this inclusion project in cooperation with numerous institutions in northern Italy (Brescia-Lombardy). Participants were either congenitally deaf or their deafness was related to other pathologies. This research promoted the integration of deaf students as they pass from primary school to high school to college. Learning methods and processes were studied that focused on encouraging individual autonomy and socialization. The research team and its collaborators included school teachers, speech therapists, psychologists and home tutors, as well as teaching assistants, child neuropsychiatrists and other external authorities involved with deaf persons social inclusion programs. Deaf children and their families were supported, in terms of inclusion, and were made aware of the research team that focused on the Bisogni Educativi Speciali (BES or Special Educational Needs) (L.170/2010 - DM 5669/2011). This project included a diagnostic and evaluative phase as well as an operational one. Results demonstrated that deaf children were highly satisfied and confident; academic performance improved and collaboration in school increased. Deaf children felt that they had access to high school and college. Empowerment for the families of deaf children in terms of networking among local services that deal with the deaf also improved while family satisfaction also improved. We found that teachers and those who gave support to deaf children increased their professional skills. Achieving autonomy, instrumental, communicative and relational abilities were also found to be crucial. Project success was determined by temporal continuity, clear theoretical methodology, strong alliance for the project direction and a resilient team response.
Abstract: The paper deals with acoustic-spectrographic voice
identification method in terms of its performance in non-native
language speech. Performance evaluation is conducted by comparing
the result of the analysis of recordings containing native language
speech with recordings that contain foreign language speech. Our
research is based on Tajik and Russian speech of Tajik native
speakers due to the character of the criminal situation with drug
trafficking. We propose a pilot experiment that represents a primary
attempt enter the field.
Abstract: When Dwight D. Eisenhower delivered his final Presidential speech in 1961, he was using the opportunity to bid farewell to America, but he was also trying to warn his fellow countrymen about deeper challenges threatening the country. In this analysis, Eisenhower’s speech is examined in light of the impact it had on American culture, communication concepts, and political ramifications. The paper initially highlights the previous literature on the speech, especially in light of its 50th anniversary, and reveals a man whose main concern was how the speech’s words would affect his beloved country. The painstaking approach to the wording of the speech to reveal the intent is key, particularly in light of analyzing the motivations according to “virtuous communication.” This philosophical construct indicates that Eisenhower’s Farewell Address was crafted carefully according to a departing President’s deepest values and concerns, concepts that he wanted to pass along to his successor, to his country, and even to the world.
Abstract: Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.
Abstract: The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.
Abstract: Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech.
Abstract: This paper describes a method for AWGN (Additive White Gaussian Noise) variance estimation in noisy stochastic signals, referred to as Multiplicative-Noising Variance Estimation (MNVE). The aim was to develop an estimation algorithm with minimal number of assumptions on the original signal structure. The provided MATLAB simulation and results analysis of the method applied on speech signals showed more accuracy than standardized AR (autoregressive) modeling noise estimation technique. In addition, great performance was observed on very low signal-to-noise ratios, which in general represents the worst case scenario for signal denoising methods. High execution time appears to be the only disadvantage of MNVE. After close examination of all the observed features of the proposed algorithm, it was concluded it is worth of exploring and that with some further adjustments and improvements can be enviably powerful.
Abstract: Numerous signal processing based speech enhancement systems have been proposed to improve intelligibility in the presence of noise. Traditionally, studies of neural vowel encoding have focused on the representation of formants (peaks in vowel spectra) in the discharge patterns of the population of auditory-nerve (AN) fibers. A method is presented for recording high-frequency speech components into a low-frequency region, to increase audibility for hearing loss listeners. The purpose of the paper is to enhance the formant of the speech based on the Kaiser window. The pitch and formant of the signal is based on the auto correlation, zero crossing and magnitude difference function. The formant enhancement stage aims to restore the representation of formants at the level of the midbrain. A MATLAB software’s are used for the implementation of the system with low complexity is developed.
Abstract: The aim of this paper is to examine and identify the issue of linguistic redundancy in two competing grammars of Malay, namely the school grammar and the corpus grammar. The former is a normative grammar which is formally and prescriptively taught in the classroom, whereas the latter is a descriptive grammar that is informally acquired and mastered by the students as native speakers of the language outside the classroom. Corpus grammar is depicted based on its actual used in natural occurring texts, as attested in the corpus. It is observed that the grammar taught in schools is incompatible with the grammar used in the corpus. For instance, a noun phrase containing nominal reduplicated form which denotes plurality (i.e. murid-murid ‘students’ which is derived from murid ‘student’) and a modifier categorized as quantifiers (i.e. semua ‘all’, seluruh ‘entire’, and kebanyakan ‘most’) is not acceptable in the school grammar because the formation (i.e. semua murid-murid ‘all the students’ kebanyakan pelajar-pelajar ‘most of the students’) is claimed to be redundant, and redundancy is prohibited in the grammar. Redundancy is generally construed as the property of speech and language by which more information is provided than is precisely required for the message to be understood, so that, if some information is omitted, the remaining information will still be sufficient for the message to be comprehended. Thus, the correct construction to be used is strictly the reduplicated form (i.e. murid-murid ‘students’) or the quantifier plus the root (i.e. semua murid ‘all the students’) with the intention that the grammatical meaning of plural is not repeated. Nevertheless, the so-called redundant form (i.e. kebanyakan pelajar-pelajar ‘most of the students’) is frequently used in the corpus grammar. This study shows that there are a number of redundant forms occur in the morphology of the language, particularly in affixation, reduplication and combination of both. Apparently, the so-called redundancy has grammatical and socio-cultural functions in communication that is to give emphasis and to stress the importance of the information delivered by the speakers or writers.
Abstract: Scripts are one of the basic text resources to understand
broadcasting contents. Topic modeling is the method to get the
summary of the broadcasting contents from its scripts. Generally,
scripts represent contents descriptively with directions and speeches,
and provide scene segments that can be seen as semantic units.
Therefore, a script can be topic modeled by treating a scene segment
as a document. Because scene segments consist of speeches mainly,
however, relatively small co-occurrences among words in the scene
segments are observed. This causes inevitably the bad quality of
topics by statistical learning method. To tackle this problem, we
propose a method to improve topic quality with additional word
co-occurrence information obtained using scene similarities. The
main idea of improving topic quality is that the information that
two or more texts are topically related can be useful to learn high
quality of topics. In addition, more accurate topical representations
lead to get information more accurate whether two texts are related
or not. In this paper, we regard two scene segments are related
if their topical similarity is high enough. We also consider that
words are co-occurred if they are in topically related scene segments
together. By iteratively inferring topics and determining semantically
neighborhood scene segments, we draw a topic space represents
broadcasting contents well. In the experiments, we showed the
proposed method generates a higher quality of topics from Korean
drama scripts than the baselines.
Abstract: The paper shows that on transferring sense from the
SL to the TL, the translator’s reading against the grain determines the
creation of a faulty pattern of rendering the original meaning in the
receiving culture which reflects the use of misleading transformative
codes. In this case, the translator is a writer per se who decides what
goes in and out of the book, how the style is to be ciphered and what
elements of ideology are to be highlighted. The paper also proves that
figurative language must not be flattened for the sake of clarity or
naturalness. The missing figurative elements make the translated text
less interesting, less challenging and less vivid which reflects poorly
on the writer. There is a close connection between style and the
writer’s person. If the writer’s style is very much altered in a
translation, the translation is useless as the original writer and his /
her imaginative world can no longer be discovered. The purpose of the paper is to prove that adaptation is a dangerous
tool which leads to variants that sometimes reflect the original less
than the reader would wish to. It contradicts the very essence of the
process of translation which is that of making an original work
available in a foreign language. If the adaptive transformative codes
are so flexible that they encourage the translator to repeatedly leave
out parts of the original work, then a subversive pattern emerges
which changes the entire book. In conclusion, as a result of using adaptation, manipulative or
subversive effects are created in the translated work. This is generally
achieved by adding new words or connotations, creating new figures
of speech or using explicitations. The additional meanings of the
original work are neglected and the translator creates new meanings,
implications, emphases and contexts. Again s/he turns into a new
author who enjoys the freedom of expressing his / her own ideas
without the constraints of the original text. Reading against the grain
is unadvisable during the process of translation and consequently,
following personal common sense becomes essential in the field of
translation as well as everywhere else, so that translation should not
become a source of fantasy.
Abstract: The aim of the study is to compare behavioral and
EEG reactions in Turkic-speaking inhabitants of Siberia (Tuvinians
and Yakuts) and Russians during the recognition of syntax errors in
native and foreign languages. Sixty-three healthy aboriginals of the
Tyva Republic, 29 inhabitants of the Sakha (Yakutia) Republic, and
55 Russians from Novosibirsk participated in the study. EEG were
recorded during execution of error-recognition task in Russian and
English language (in all participants) and in native languages
(Tuvinian or Yakut Turkic-speaking inhabitants). Reaction time (RT)
and quality of task execution were chosen as behavioral measures.
Amplitude and cortical distribution of P300 and P600 peaks of ERP
were used as a measure of speech-related brain activity. In Tuvinians,
there were no differences in the P300 and P600 amplitudes as well as
in cortical topology for Russian and Tuvinian languages, but there
was a difference for English. In Yakuts, the P300 and P600
amplitudes and topology of ERP for Russian language were the same
as Russians had for native language. In Yakuts, brain reactions during
Yakut and English language comprehension had no difference, while
the Russian language comprehension was differed from both Yakut
and English. We found out that the Tuvinians recognized both Russian and
Tuvinian as native languages, and English as a foreign language. The
Yakuts recognized both English and Yakut as foreign languages, but
Russian as a native language. According to the inquirer, both
Tuvinians and Yakuts use the national language as a spoken
language, whereas they do not use it for writing. It can well be a
reason that Yakuts perceive the Yakut writing language as a foreign
language while writing Russian as their native.
Abstract: The purpose of the study is to find out relation of
moral massage between the authority and globalization in proverb.
Proverb is one of the many forms of cultural identity of the
Indonesian/Malay people filled with moral values. The values
contained within those proverbs are beneficial not only to the society,
but also to those who held power amidst on this era of globalization.
The method being used is qualitative research through content
analysis which is done by describing and uncovering the forms and
meanings of proverbs used within Indonesia Minangkabau society.
Sources for this study’s data were extracted from a Minangkabau
native speaker in the sub district of Tanah Abang, Jakarta. Said
sources were retrieved through a series of interviews with the
Minangkabau native speaker, whose speech is still adorned with
idiomatic expressions. The research findings show that there are 30
existed proverbs or idiomatic expressions in the Minangkabau
language often used by its indigenous people. The thirty data contain
moral values which are closely interwoven with the matter of power
and globalization. Analytical results show that the fourteen moral
values contained within proverbs reflect a firm connection between
rule and power in globalization; such as: responsible, brave,
togetherness and consensus, tolerance, politeness, thorough and
meticulous, honest and keeping promise, ingenious and learning,
care, self-correction, be fair, alert, arbitrary, self-awareness.
Structurally, proverbs possess an unchangeably formal construction;
symbolically, proverbs possess meanings that are clearly decided
through ethnographic communicative factors along with situational
and cultural contexts. Values contained within proverbs may be used
as a guide in social management, be it between fellow men, between
men and nature, or even between men and their Creator. Therefore,
the meanings and values contained within the morals of proverbs
could also be utilized as a counsel for those who rule and in charge of
power in order to stem the tides of globalization that had already
spread into sectoral, territorial and educational continuums.
Abstract: Speaker Identification (SI) is the task of establishing
identity of an individual based on his/her voice characteristics. The SI
task is typically achieved by two-stage signal processing: training and
testing. The training process calculates speaker specific feature
parameters from the speech and generates speaker models
accordingly. In the testing phase, speech samples from unknown
speakers are compared with the models and classified. Even though
performance of speaker identification systems has improved due to
recent advances in speech processing techniques, there is still need of
improvement. In this paper, a Closed-Set Tex-Independent Speaker
Identification System (CISI) based on a Multiple Classifier System
(MCS) is proposed, using Mel Frequency Cepstrum Coefficient
(MFCC) as feature extraction and suitable combination of vector
quantization (VQ) and Gaussian Mixture Model (GMM) together
with Expectation Maximization algorithm (EM) for speaker
modeling. The use of Voice Activity Detector (VAD) with a hybrid
approach based on Short Time Energy (STE) and Statistical
Modeling of Background Noise in the pre-processing step of the
feature extraction yields a better and more robust automatic speaker
identification system. Also investigation of Linde-Buzo-Gray (LBG)
clustering algorithm for initialization of GMM, for estimating the
underlying parameters, in the EM step improved the convergence rate
and systems performance. It also uses relative index as confidence
measures in case of contradiction in identification process by GMM
and VQ as well. Simulation results carried out on voxforge.org
speech database using MATLAB highlight the efficacy of the
proposed method compared to earlier work.
Abstract: Over the past few years, a lot of research has been
conducted to bring Automatic Speech Recognition (ASR) into various
areas of Air Traffic Control (ATC), such as air traffic control
simulation and training, monitoring live operators for with the aim
of safety improvements, air traffic controller workload measurement
and conducting analysis on large quantities controller-pilot speech.
Due to the high accuracy requirements of the ATC context and its
unique challenges, automatic speech recognition has not been widely
adopted in this field. With the aim of providing a good starting
point for researchers who are interested bringing automatic speech
recognition into ATC, this paper gives an overview of possibilities
and challenges of applying automatic speech recognition in air traffic
control. To provide this overview, we present an updated literature
review of speech recognition technologies in general, as well as
specific approaches relevant to the ATC context. Based on this
literature review, criteria for selecting speech recognition approaches
for the ATC domain are presented, and remaining challenges and
possible solutions are discussed.
Abstract: Today, there is a large number of political transcripts
available on the Web to be mined and used for statistical analysis,
and product recommendations. As the online political resources are
used for various purposes, automatically determining the political
orientation on these transcripts becomes crucial. The methodologies
used by machine learning algorithms to do an automatic classification
are based on different features that are classified under categories
such as Linguistic, Personality etc. Considering the ideological
differences between Liberals and Conservatives, in this paper, the
effect of Personality traits on political orientation classification is
studied. The experiments in this study were based on the correlation
between LIWC features and the BIG Five Personality traits. Several
experiments were conducted using Convote U.S. Congressional-
Speech dataset with seven benchmark classification algorithms. The
different methodologies were applied on several LIWC feature sets
that constituted by 8 to 64 varying number of features that are
correlated to five personality traits. As results of experiments,
Neuroticism trait was obtained to be the most differentiating
personality trait for classification of political orientation. At the same
time, it was observed that the personality trait based classification
methodology gives better and comparable results with the related
work.
Abstract: The notion of power and gender domination is one of
the inseparable aspects of themes in postmodern literature. The
reason of its importance has been discussed frequently since the rise
of Michel Foucault and his insight into the circulation of power and
the transgression of forces. Language and society operate as the basic
grounds for the study, as all human beings are bound to the set of
rules and norms which shape them in the acceptable way in the
macrocosm. How different genders in different positions behave and
show reactions to the provocation of social forces and superiority of
one another is of great interest to writers and literary critics. Mamet’s
works are noticeable for their controversial but timely themes which
illustrate human conflicts with the society and greed for power. Many
critics like Christopher Bigsby and Harold Bloom have discussed
Mamet and his ideas in recent years. This paper is the study of
Oleanna, Mamet’s masterpiece about the teacher-student relationship
and the circulation of power between a man and woman. He shows
the very breakable boundaries in the domination of a gender and the
downfall of speech as the consequence of transgression and freedom.
The failure of the language the teacher uses and the abuse of his own
words by a student who seeks superiority and knowledge are the
main subjects of the discussion. Supported by the ideas of Foucault,
the language Mamet uses to present his characters becomes the
fundamental premise in this study. As a result, language becomes
both the means of achievement and downfall.
Abstract: For the music composer Myriam Marbe the musical
time and memory represent 2 (complementary) phenomena with
conclusive impact on the settlement of new musical ontologies.
Summarizing the most important achievements of the contemporary
techniques of composition, her vision on the microform presented in
The Concert for Daniel Kientzy, saxophone and orchestra transcends
the linear and unidirectional time in favour of a flexible, multivectorial
speech with spiral developments, where the sound substance
is auto(re)generated by analogy with the fundamental processes of
the memory. The conceptual model is of an archetypal essence, the
music composer being concerned with identifying the mechanisms of
the creation process, especially of those specific to the collective
creation (of oral tradition). Hence the spontaneity of expression,
improvisation tint, free rhythm, micro-interval intonation, coloristictimbral
universe dominated by multiphonics and unique sound
effects, hence the atmosphere of ritual, however purged by the
primary connotations and reprojected into a wonderful spectacular
space. The Concert is a work of artistic maturity and enforces respect,
among others, by the timbral diversity of the three species of
saxophone required by the music composer (baritone, sopranino and
alt), in Part III Daniel Kientzy shows the performance of playing two
saxophones concomitantly. The score of the music composer Myriam
Marbe contains a deeply spiritualized music, full or archetypal
symbols, a music whose drama suggests a real cinematographic
movement.
Abstract: This research study aims to present a retrospective
study about speech recognition systems and artificial intelligence.
Speech recognition has become one of the widely used technologies,
as it offers great opportunity to interact and communicate with
automated machines. Precisely, it can be affirmed that speech
recognition facilitates its users and helps them to perform their daily
routine tasks, in a more convenient and effective manner. This
research intends to present the illustration of recent technological
advancements, which are associated with artificial intelligence.
Recent researches have revealed the fact that speech recognition is
found to be the utmost issue, which affects the decoding of speech. In
order to overcome these issues, different statistical models were
developed by the researchers. Some of the most prominent statistical
models include acoustic model (AM), language model (LM), lexicon
model, and hidden Markov models (HMM). The research will help in
understanding all of these statistical models of speech recognition.
Researchers have also formulated different decoding methods, which
are being utilized for realistic decoding tasks and constrained
artificial languages. These decoding methods include pattern
recognition, acoustic phonetic, and artificial intelligence. It has been
recognized that artificial intelligence is the most efficient and reliable
methods, which are being used in speech recognition.