Blind Speech Separation Using SRP-PHAT Localization and Optimal Beamformer in Two-Speaker Environments

This paper investigates the problem of blind speech separation from the speech mixture of two speakers. A voice activity detector employing the Steered Response Power - Phase Transform (SRP-PHAT) is presented for detecting the activity information of speech sources and then the desired speech signals are extracted from the speech mixture by using an optimal beamformer. For evaluation, the algorithm effectiveness, a simulation using real speech recordings had been performed in a double-talk situation where two speakers are active all the time. Evaluations show that the proposed blind speech separation algorithm offers a good interference suppression level whilst maintaining a low distortion level of the desired signal.

An Inclusion Project for Deaf Children into a Northern Italy Contest

84 deaf students (from primary school to college) and their families participated in this inclusion project in cooperation with numerous institutions in northern Italy (Brescia-Lombardy). Participants were either congenitally deaf or their deafness was related to other pathologies. This research promoted the integration of deaf students as they pass from primary school to high school to college. Learning methods and processes were studied that focused on encour­aging individual autonomy and socialization. The research team and its collaborators included school teachers, speech ther­apists, psychologists and home tutors, as well as teaching assistants, child neuropsychiatrists and other external authorities involved with deaf persons social inclusion programs. Deaf children and their families were supported, in terms of inclusion, and were made aware of the research team that focused on the Bisogni Educativi Speciali (BES or Special Educational Needs) (L.170/2010 - DM 5669/2011). This project included a diagnostic and evaluative phase as well as an operational one. Results demonstrated that deaf children were highly satisfied and confident; academic performance improved and collaboration in school increased. Deaf children felt that they had access to high school and college. Empowerment for the families of deaf children in terms of networking among local services that deal with the deaf also improved while family satisfaction also improved. We found that teachers and those who gave support to deaf children increased their professional skills. Achieving autonomy, instrumental, communicative and relational abilities were also found to be crucial. Project success was determined by temporal continuity, clear theoretical methodology, strong alliance for the project direction and a resilient team response.

Performance Evaluation of Acoustic-Spectrographic Voice Identification Method in Native and Non-Native Speech

The paper deals with acoustic-spectrographic voice identification method in terms of its performance in non-native language speech. Performance evaluation is conducted by comparing the result of the analysis of recordings containing native language speech with recordings that contain foreign language speech. Our research is based on Tajik and Russian speech of Tajik native speakers due to the character of the criminal situation with drug trafficking. We propose a pilot experiment that represents a primary attempt enter the field.

Eisenhower’s Farewell Speech: Initial and Continuing Communication Effects

When Dwight D. Eisenhower delivered his final Presidential speech in 1961, he was using the opportunity to bid farewell to America, but he was also trying to warn his fellow countrymen about deeper challenges threatening the country. In this analysis, Eisenhower’s speech is examined in light of the impact it had on American culture, communication concepts, and political ramifications. The paper initially highlights the previous literature on the speech, especially in light of its 50th anniversary, and reveals a man whose main concern was how the speech’s words would affect his beloved country. The painstaking approach to the wording of the speech to reveal the intent is key, particularly in light of analyzing the motivations according to “virtuous communication.” This philosophical construct indicates that Eisenhower’s Farewell Address was crafted carefully according to a departing President’s deepest values and concerns, concepts that he wanted to pass along to his successor, to his country, and even to the world.

Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments

Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.

A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features

The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.

A Two-Stage Adaptation towards Automatic Speech Recognition System for Malay-Speaking Children

Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech.

An Approach to Noise Variance Estimation in Very Low Signal-to-Noise Ratio Stochastic Signals

This paper describes a method for AWGN (Additive White Gaussian Noise) variance estimation in noisy stochastic signals, referred to as Multiplicative-Noising Variance Estimation (MNVE). The aim was to develop an estimation algorithm with minimal number of assumptions on the original signal structure. The provided MATLAB simulation and results analysis of the method applied on speech signals showed more accuracy than standardized AR (autoregressive) modeling noise estimation technique. In addition, great performance was observed on very low signal-to-noise ratios, which in general represents the worst case scenario for signal denoising methods. High execution time appears to be the only disadvantage of MNVE. After close examination of all the observed features of the proposed algorithm, it was concluded it is worth of exploring and that with some further adjustments and improvements can be enviably powerful.

Speech Enhancement of Vowels Based on Pitch and Formant Frequency

Numerous signal processing based speech enhancement systems have been proposed to improve intelligibility in the presence of noise. Traditionally, studies of neural vowel encoding have focused on the representation of formants (peaks in vowel spectra) in the discharge patterns of the population of auditory-nerve (AN) fibers. A method is presented for recording high-frequency speech components into a low-frequency region, to increase audibility for hearing loss listeners. The purpose of the paper is to enhance the formant of the speech based on the Kaiser window. The pitch and formant of the signal is based on the auto correlation, zero crossing and magnitude difference function. The formant enhancement stage aims to restore the representation of formants at the level of the midbrain. A MATLAB software’s are used for the implementation of the system with low complexity is developed.

Redundancy in Malay Morphology: School Grammar versus Corpus Grammar

The aim of this paper is to examine and identify the issue of linguistic redundancy in two competing grammars of Malay, namely the school grammar and the corpus grammar. The former is a normative grammar which is formally and prescriptively taught in the classroom, whereas the latter is a descriptive grammar that is informally acquired and mastered by the students as native speakers of the language outside the classroom. Corpus grammar is depicted based on its actual used in natural occurring texts, as attested in the corpus. It is observed that the grammar taught in schools is incompatible with the grammar used in the corpus. For instance, a noun phrase containing nominal reduplicated form which denotes plurality (i.e. murid-murid ‘students’ which is derived from murid ‘student’) and a modifier categorized as quantifiers (i.e. semua ‘all’, seluruh ‘entire’, and kebanyakan ‘most’) is not acceptable in the school grammar because the formation (i.e. semua murid-murid ‘all the students’ kebanyakan pelajar-pelajar ‘most of the students’) is claimed to be redundant, and redundancy is prohibited in the grammar. Redundancy is generally construed as the property of speech and language by which more information is provided than is precisely required for the message to be understood, so that, if some information is omitted, the remaining information will still be sufficient for the message to be comprehended. Thus, the correct construction to be used is strictly the reduplicated form (i.e. murid-murid ‘students’) or the quantifier plus the root (i.e. semua murid ‘all the students’) with the intention that the grammatical meaning of plural is not repeated. Nevertheless, the so-called redundant form (i.e. kebanyakan pelajar-pelajar ‘most of the students’) is frequently used in the corpus grammar. This study shows that there are a number of redundant forms occur in the morphology of the language, particularly in affixation, reduplication and combination of both. Apparently, the so-called redundancy has grammatical and socio-cultural functions in communication that is to give emphasis and to stress the importance of the information delivered by the speakers or writers.

Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Scripts are one of the basic text resources to understand broadcasting contents. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches, and provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scene segments consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics by statistical learning method. To tackle this problem, we propose a method to improve topic quality with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, more accurate topical representations lead to get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. By iteratively inferring topics and determining semantically neighborhood scene segments, we draw a topic space represents broadcasting contents well. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Reading against the Grain: Transcodifying Stimulus Meaning

The paper shows that on transferring sense from the SL to the TL, the translator’s reading against the grain determines the creation of a faulty pattern of rendering the original meaning in the receiving culture which reflects the use of misleading transformative codes. In this case, the translator is a writer per se who decides what goes in and out of the book, how the style is to be ciphered and what elements of ideology are to be highlighted. The paper also proves that figurative language must not be flattened for the sake of clarity or naturalness. The missing figurative elements make the translated text less interesting, less challenging and less vivid which reflects poorly on the writer. There is a close connection between style and the writer’s person. If the writer’s style is very much altered in a translation, the translation is useless as the original writer and his / her imaginative world can no longer be discovered. The purpose of the paper is to prove that adaptation is a dangerous tool which leads to variants that sometimes reflect the original less than the reader would wish to. It contradicts the very essence of the process of translation which is that of making an original work available in a foreign language. If the adaptive transformative codes are so flexible that they encourage the translator to repeatedly leave out parts of the original work, then a subversive pattern emerges which changes the entire book. In conclusion, as a result of using adaptation, manipulative or subversive effects are created in the translated work. This is generally achieved by adding new words or connotations, creating new figures of speech or using explicitations. The additional meanings of the original work are neglected and the translator creates new meanings, implications, emphases and contexts. Again s/he turns into a new author who enjoys the freedom of expressing his / her own ideas without the constraints of the original text. Reading against the grain is unadvisable during the process of translation and consequently, following personal common sense becomes essential in the field of translation as well as everywhere else, so that translation should not become a source of fantasy.

Behavioral and EEG Reactions in Native Turkic-Speaking Inhabitants of Siberia and Siberian Russians during Recognition of Syntactic Errors in Sentences in Native and Foreign Languages

The aim of the study is to compare behavioral and EEG reactions in Turkic-speaking inhabitants of Siberia (Tuvinians and Yakuts) and Russians during the recognition of syntax errors in native and foreign languages. Sixty-three healthy aboriginals of the Tyva Republic, 29 inhabitants of the Sakha (Yakutia) Republic, and 55 Russians from Novosibirsk participated in the study. EEG were recorded during execution of error-recognition task in Russian and English language (in all participants) and in native languages (Tuvinian or Yakut Turkic-speaking inhabitants). Reaction time (RT) and quality of task execution were chosen as behavioral measures. Amplitude and cortical distribution of P300 and P600 peaks of ERP were used as a measure of speech-related brain activity. In Tuvinians, there were no differences in the P300 and P600 amplitudes as well as in cortical topology for Russian and Tuvinian languages, but there was a difference for English. In Yakuts, the P300 and P600 amplitudes and topology of ERP for Russian language were the same as Russians had for native language. In Yakuts, brain reactions during Yakut and English language comprehension had no difference, while the Russian language comprehension was differed from both Yakut and English. We found out that the Tuvinians recognized both Russian and Tuvinian as native languages, and English as a foreign language. The Yakuts recognized both English and Yakut as foreign languages, but Russian as a native language. According to the inquirer, both Tuvinians and Yakuts use the national language as a spoken language, whereas they do not use it for writing. It can well be a reason that Yakuts perceive the Yakut writing language as a foreign language while writing Russian as their native.

Unveiling the Indonesian Identity through Proverbial Expressions: The Relation of Meaning between Authority and Globalization

The purpose of the study is to find out relation of moral massage between the authority and globalization in proverb. Proverb is one of the many forms of cultural identity of the Indonesian/Malay people filled with moral values. The values contained within those proverbs are beneficial not only to the society, but also to those who held power amidst on this era of globalization. The method being used is qualitative research through content analysis which is done by describing and uncovering the forms and meanings of proverbs used within Indonesia Minangkabau society. Sources for this study’s data were extracted from a Minangkabau native speaker in the sub district of Tanah Abang, Jakarta. Said sources were retrieved through a series of interviews with the Minangkabau native speaker, whose speech is still adorned with idiomatic expressions. The research findings show that there are 30 existed proverbs or idiomatic expressions in the Minangkabau language often used by its indigenous people. The thirty data contain moral values which are closely interwoven with the matter of power and globalization. Analytical results show that the fourteen moral values contained within proverbs reflect a firm connection between rule and power in globalization; such as: responsible, brave, togetherness and consensus, tolerance, politeness, thorough and meticulous, honest and keeping promise, ingenious and learning, care, self-correction, be fair, alert, arbitrary, self-awareness. Structurally, proverbs possess an unchangeably formal construction; symbolically, proverbs possess meanings that are clearly decided through ethnographic communicative factors along with situational and cultural contexts. Values contained within proverbs may be used as a guide in social management, be it between fellow men, between men and nature, or even between men and their Creator. Therefore, the meanings and values contained within the morals of proverbs could also be utilized as a counsel for those who rule and in charge of power in order to stem the tides of globalization that had already spread into sectoral, territorial and educational continuums.

An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System

Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.

Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control

Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed.

Effect of Personality Traits on Classification of Political Orientation

Today, there is a large number of political transcripts available on the Web to be mined and used for statistical analysis, and product recommendations. As the online political resources are used for various purposes, automatically determining the political orientation on these transcripts becomes crucial. The methodologies used by machine learning algorithms to do an automatic classification are based on different features that are classified under categories such as Linguistic, Personality etc. Considering the ideological differences between Liberals and Conservatives, in this paper, the effect of Personality traits on political orientation classification is studied. The experiments in this study were based on the correlation between LIWC features and the BIG Five Personality traits. Several experiments were conducted using Convote U.S. Congressional- Speech dataset with seven benchmark classification algorithms. The different methodologies were applied on several LIWC feature sets that constituted by 8 to 64 varying number of features that are correlated to five personality traits. As results of experiments, Neuroticism trait was obtained to be the most differentiating personality trait for classification of political orientation. At the same time, it was observed that the personality trait based classification methodology gives better and comparable results with the related work.

Under the Veneer of Words Lies Power: Foucauldian Analysis of Oleanna

The notion of power and gender domination is one of the inseparable aspects of themes in postmodern literature. The reason of its importance has been discussed frequently since the rise of Michel Foucault and his insight into the circulation of power and the transgression of forces. Language and society operate as the basic grounds for the study, as all human beings are bound to the set of rules and norms which shape them in the acceptable way in the macrocosm. How different genders in different positions behave and show reactions to the provocation of social forces and superiority of one another is of great interest to writers and literary critics. Mamet’s works are noticeable for their controversial but timely themes which illustrate human conflicts with the society and greed for power. Many critics like Christopher Bigsby and Harold Bloom have discussed Mamet and his ideas in recent years. This paper is the study of Oleanna, Mamet’s masterpiece about the teacher-student relationship and the circulation of power between a man and woman. He shows the very breakable boundaries in the domination of a gender and the downfall of speech as the consequence of transgression and freedom. The failure of the language the teacher uses and the abuse of his own words by a student who seeks superiority and knowledge are the main subjects of the discussion. Supported by the ideas of Foucault, the language Mamet uses to present his characters becomes the fundamental premise in this study. As a result, language becomes both the means of achievement and downfall.

Obsession of Time and the New Musical Ontologies: The Concert for Saxophone, Daniel Kientzy and Orchestra by Myriam Marbe

For the music composer Myriam Marbe the musical time and memory represent 2 (complementary) phenomena with conclusive impact on the settlement of new musical ontologies. Summarizing the most important achievements of the contemporary techniques of composition, her vision on the microform presented in The Concert for Daniel Kientzy, saxophone and orchestra transcends the linear and unidirectional time in favour of a flexible, multivectorial speech with spiral developments, where the sound substance is auto(re)generated by analogy with the fundamental processes of the memory. The conceptual model is of an archetypal essence, the music composer being concerned with identifying the mechanisms of the creation process, especially of those specific to the collective creation (of oral tradition). Hence the spontaneity of expression, improvisation tint, free rhythm, micro-interval intonation, coloristictimbral universe dominated by multiphonics and unique sound effects, hence the atmosphere of ritual, however purged by the primary connotations and reprojected into a wonderful spectacular space. The Concert is a work of artistic maturity and enforces respect, among others, by the timbral diversity of the three species of saxophone required by the music composer (baritone, sopranino and alt), in Part III Daniel Kientzy shows the performance of playing two saxophones concomitantly. The score of the music composer Myriam Marbe contains a deeply spiritualized music, full or archetypal symbols, a music whose drama suggests a real cinematographic movement.

Advances in Artificial Intelligence Using Speech Recognition

This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.