Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction

Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decisiontrees are terms associated to the concept “Machine Learning" ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as interesting/not interesting. From these examples, the ROGER algorithm learns a numerical function, inducing some ranking on the collocations. This ranking is optimized using genetic algorithms, maximizing the trade-off between the false positive and true positive rates (Area Under the ROC curve). This approach uses a particular representation for the word collocations, namely the vector of values corresponding to the standard statistical interestingness measures attached to this collocation. As this representation is general (over corpora and natural languages), generality tests were performed by experimenting the ranking function learned from an English corpus in Biology, onto a French corpus of Curriculum Vitae, and vice versa, showing a good robustness of the approaches compared to the state-of-the-art Support Vector Machine (SVM).

Unit Selection Algorithm Using Bi-grams Model For Corpus-Based Speech Synthesis

In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenation cost were classically defined for unit selection. In Corpus-Based Speech Synthesis System, when using large text corpora, cost functions were limited to a juxtaposition of symbolic criteria and the acoustic information of units is not exploited in the definition of the target cost. In this manuscript, we token in our consideration the unit phonetic information corresponding to acoustic information. This would be realized by defining a probabilistic linguistic Bi-grams model basically used for unit selection. The selected units would be extracted from the English TIMIT corpora.

Awareness of Reading Strategies among EFL Learners at Bangkok University

This questionnaire-based study, aimed to measure and compare the awareness of English reading strategies among EFL learners at Bangkok University (BU) classified by their gender, field of study, and English learning experience. Proportional stratified random sampling was employed to formulate a sample of 380 BU students. The data were statistically analyzed in terms of the mean and standard deviation. t-Test analysis was used to find differences in awareness of reading strategies between two groups (-male and female- /-science and social-science students). In addition, one-way analysis of variance (ANOVA) was used to compare reading strategy awareness among BU students with different lengths of English learning experience. The results of this study indicated that the overall awareness of reading strategies of EFL learners at BU was at a high level (ðÑ = 3.60) and that there was no statistically significant difference between males and females, and among students who have different lengths of English learning experience at the significance level of 0.05. However, significant differences among students coming from different fields of study were found at the same level of significance.

A Validity and Reliability Study of Grasha- Riechmann Student Learning Style Scale

The reliability of the tools developed to learn the learning styles is essential to find out students- learning styles trustworthily. For this purpose, the psychometric features of Grasha- Riechman Student Learning Style Inventory developed by Grasha was studied to contribute to this field. The study was carried out on 6th, 7th, and 8th graders of 10 primary education schools in Konya. The inventory was applied twice with an interval of one month, and according to the data of this application, the reliability coefficient numbers of the 6 sub-dimensions pointed in the theory of the inventory was found to be medium. Besides, it was found that the inventory does not have a structure with 6 factors for both Mathematics and English courses as represented in the theory.

Communicative Competence in Technical Oral Presentation: That “Magic“ Perceived by ESL Educators versus Content Experts

Till date, English as a Second Language (ESL) educators involved in teaching language and communication to engineering students face an uphill task in developing graduate communicative competency. This challenge is accentuated by the apparent lack of English for Specific Purposes (ESP) materials for engineering students in the engineering curriculum. As such, most ESL educators are forced to play multiple roles. They don tasks such as curriculum designers, material writers and teachers with limited knowledge of the disciplinary content. Previous research indicates that prospective professional engineers should possess some sub-sets of competency: technical, linguistic oral immediacy, meta-cognitive and rhetorical explanatory competence. Another study revealed that engineering students need to be equipped with technical and linguistic oral immediacy competence. However, little is known whether these competency needs are in line with the educators- perceptions of communicative competence. This paper examines the best mix of communicative competence subsets that create the magic for engineering students in technical oral presentations. For the purpose of this study, two groups of educators were interviewed. These educators were language and communication lecturers involved in teaching a speaking course and content experts who assess students- technical oral presentations at tertiary level. The findings indicate that these two groups differ in their perceptions

A National Idea in Conditions of the Islamic Revival

Discussion and development of principles of the uniform nation formation within the limits of the Kazakhstan state obviously became one of the most pressing questions of the day. The fact that this question has not been solved "from above" as many other questions has caused really brisk discussion, shows us increase of civil consciousness in Kazakhstan society, and also the actuality of this theme which can be carried in the category of fatal questions. In any sense, nation building has raised civil society to a much higher level. It would be better to begin with certain definitions. First is the word "nation". The second is the "state". Both of these terms are very closely connected with each other, so that in English language they are in general synonyms. In Russian more shades of these terms exist. For example in Kazakhstan the citizens of the country irrespective of nationality (but mainly with reference to non-kazakhs) are called «kazakhstanians», while the name of the title nation is \"Kazakhs\". The same we can see in Russia, where, for example, the Chechen or the Yakut –are \"Rossiyane\" which means “the citizens of Russian Federation, but not \"Russians\". The paper was written under the research project “Islam in modern Kazakhstan: the nature and outcome of the religious revival”.

Auction Theory: Bidder-s Perspective in a Public Out-Cry English Auction

This paper provides an overview of auction theory literature. We present a general review on literature of various auctions and focus ourselves specifically on an English auction. We are interested in modelling bidder-s behavior in an English auction environment. And hence, we present an overview of the New Zealand wool auction followed by a model that would describe a bidder-s decision making behavior from the New Zealand wool auction. The mathematical assumptions in an English auction environment are demonstrated from the perspective of the New Zealand wool auction.

Determining the Gender of Korean Names for Pronoun Generation

It is an important task in Korean-English machine translation to classify the gender of names correctly. When a sentence is composed of two or more clauses and only one subject is given as a proper noun, it is important to find the gender of the proper noun for correct translation of the sentence. This is because a singular pronoun has a gender in English while it does not in Korean. Thus, in Korean-English machine translation, the gender of a proper noun should be determined. More generally, this task can be expanded into the classification of the general Korean names. This paper proposes a statistical method for this problem. By considering a name as just a sequence of syllables, it is possible to get a statistics for each name from a collection of names. An evaluation of the proposed method yields the improvement in accuracy over the simple looking-up of the collection. While the accuracy of the looking-up method is 64.11%, that of the proposed method is 81.49%. This implies that the proposed method is more plausible for the gender classification of the Korean names.

Towards Benchmarking English Residential Gas Consumption

The UK Government has emphasized the role of Local Authorities as a key player in its flagship residential energy efficiency strategies, by identifying and targeting areas for energy efficiency improvements. Residential energy consumption in England is characterized by significant geographical variation in energy demand, which makes centralized targeting of areas for energy efficiency intervention difficult. This paper draws on research which aims to understand how demographic, social, economic, urban form and climatic factors influence the geographical variations in English residential gas consumption. The paper reports the findings of a multiple regression model that shows how 64% of the geographical variation in residential gas consumption is accounted for by variations in these factors. Results from this study, after further refinement and validation, can be used by Local Authorities to identify areas within their boundaries that have higher than expected gas consumption, these may be prime targets for energy efficiency initiatives.

Accent Identification by Clustering and Scoring Formants

There have been significant improvements in automatic voice recognition technology. However, existing systems still face difficulties, particularly when used by non-native speakers with accents. In this paper we address a problem of identifying the English accented speech of speakers from different backgrounds. Once an accent is identified the speech recognition software can utilise training set from appropriate accent and therefore improve the efficiency and accuracy of the speech recognition system. We introduced the Q factor, which is defined by the sum of relationships between frequencies of the formants. Four different accents were considered and experimented for this research. A scoring method was introduced in order to effectively analyse accents. The proposed concept indicates that the accent could be identified by analysing their formants.

The Innovation of English Materials to Communicate the Identity of Bangpoo, Samut Prakan Province, for Ecotourism

The main purpose of this research was to study how to communicate the identity of the Bangpoo, Samu tPrakan province for ecotourism. The qualitative data was collected through studying related materials, exploring the area, in-depth interviews with three groups of people: three directly responsible officers who were key informants of the district, twenty foreign tourists and five Thai tourist guides. A content analysis was used to analyze the qualitative data. The two main findings of the study were as follows: The identity of Bangpoo, Samut Prakan province. This establishment was near the Mouth of the Gulf of Thailand for normal people and tourists, consisting of rest accommodations. There are restaurants where food and drinks are served, rich mangrove forests, Banpoo seaside resort and mangrove trees. Bangpoo seaside resort is characterized by muddy beacheswhere the greatest number of seagulls can be seen from March to May each year. The communication of the identity of Bangpoo, Samut Prakan province which the researcher could find and design to present in English materials can be summed up in 3 items: 1) The history of Bangpoo, Samut Prakan province 2) The Learning center of Ecotourism: Seagulls and Mangrove forest 3) How to keep Banpoo, Samut Prakran province for ecotourism.

An Interactive Tool for Teaching and Learning English at Upper Primary Level for Mauritius

E-learning refers to the specific kind of learning experienced within the domain of educational technology, which can be used in or out of the classroom. In this paper, we give an overview of an e-learning platform 'An Innovative Interactive and Online English Platform for Upper Primary Students' is an interactive web-based application which will serve as an aid to the primary school students in Mauritius. The objectives of this platform are to offer quality learning resources for the English subject at our primary level of education, encourage self-learning and hence promote e-learning. The platform developed consists of several interesting features, for example, the English Verb Conjugation tool, Negative Form tool, Interrogative Form tool and Close Test Generator. Thus, this learning platform will be useful at a time where our country is looking for an alternative to private tuition and also, looking forward to increase the pass rate.

Formant Tracking Linear Prediction Model using HMMs for Noisy Speech Processing

This paper presents a formant-tracking linear prediction (FTLP) model for speech processing in noise. The main focus of this work is the detection of formant trajectory based on Hidden Markov Models (HMM), for improved formant estimation in noise. The approach proposed in this paper provides a systematic framework for modelling and utilization of a time- sequence of peaks which satisfies continuity constraints on parameter; the within peaks are modelled by the LP parameters. The formant tracking LP model estimation is composed of three stages: (1) a pre-cleaning multi-band spectral subtraction stage to reduce the effect of residue noise on formants (2) estimation stage where an initial estimate of the LP model of speech for each frame is obtained (3) a formant classification using probability models of formants and Viterbi-decoders. The evaluation results for the estimation of the formant tracking LP model tested in Gaussian white noise background, demonstrate that the proposed combination of the initial noise reduction stage with formant tracking and LPC variable order analysis, results in a significant reduction in errors and distortions. The performance was evaluated with noisy natual vowels extracted from international french and English vocabulary speech signals at SNR value of 10dB. In each case, the estimated formants are compared to reference formants.

Inferring Hierarchical Pronunciation Rules from a Phonetic Dictionary

This work presents a new phonetic transcription system based on a tree of hierarchical pronunciation rules expressed as context-specific grapheme-phoneme correspondences. The tree is automatically inferred from a phonetic dictionary by incrementally analyzing deeper context levels, eventually representing a minimum set of exhaustive rules that pronounce without errors all the words in the training dictionary and that can be applied to out-of-vocabulary words. The proposed approach improves upon existing rule-tree-based techniques in that it makes use of graphemes, rather than letters, as elementary orthographic units. A new linear algorithm for the segmentation of a word in graphemes is introduced to enable outof- vocabulary grapheme-based phonetic transcription. Exhaustive rule trees provide a canonical representation of the pronunciation rules of a language that can be used not only to pronounce out-of-vocabulary words, but also to analyze and compare the pronunciation rules inferred from different dictionaries. The proposed approach has been implemented in C and tested on Oxford British English and Basic English. Experimental results show that grapheme-based rule trees represent phonetically sound rules and provide better performance than letter-based rule trees.

A Pragmatic Study of Metaphorization in English Newspaper Headlines

This paper attempts to explore the phenomenon of metaphorization in English newspaper headlines from the perspective of pragmatic investigation. With relevance theory as the guideline, this paper makes an explanation of the processing of metaphor with a pragmatic approach and points that metaphor is the stimulus adopted by journalists to achieve optimal relevance in this ostensive communication, as well as the strategy to fulfill their writing purpose.

Improving E-Government Services for Non- English Speaking Background (NESB) Communities in Australia

Australian government agencies have a natural desire to provide migrants a wide range of opportunities. Consequently, government online services should be equally available to migrants with a non-English speaking background (NESB). Despite the commendable efforts of governments and local agencies in Australia to provide such services, in reality, many NESB communities are not taking advantage of these services. This article–based on an extensive case study regarding the use of online government services by the Arabic NESB community in Australia–reports on the possible reasons for this issue, as well as suggestions for improvement. The conclusion is that Australia should implement ICT-based or e-government policies, programmes, and services that more accurately reflect migrant cultures and languages so that migrant integration can be more fully accomplished. Specifically, this article presents an NESB Model that adopts the value of usercentricity or a more individual-focused approach to government online services in Australia.

The Predictability and Abstractness of Language: A Study in Understanding and Usage of the English Language through Probabilistic Modeling and Frequency

Accounts of language acquisition differ significantly in their treatment of the role of prediction in language learning. In particular, nativist accounts posit that probabilistic learning about words and word sequences has little to do with how children come to use language. The accuracy of this claim was examined by testing whether distributional probabilities and frequency contributed to how well 3-4 year olds repeat simple word chunks. Corresponding chunks were the same length, expressed similar content, and were all grammatically acceptable, yet the results of the study showed marked differences in performance when overall distributional frequency varied. It was found that a distributional model of language predicted the empirical findings better than a number of other models, replicating earlier findings and showing that children attend to distributional probabilities in an adult corpus. This suggested that language is more prediction-and-error based, rather than on abstract rules which nativist camps suggest.

AGHAZ : An Expert System Based approach for the Translation of English to Urdu

Machine Translation (MT 3) of English text to its Urdu equivalent is a difficult challenge. Lot of attempts has been made, but a few limited solutions are provided till now. We present a direct approach, using an expert system to translate English text into its equivalent Urdu, using The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1) Range: 0600–06FF. The expert system works with a knowledge base that contains grammatical patterns of English and Urdu, as well as a tense and gender-aware dictionary of Urdu words (with their English equivalents).

The Relationship between Learners-Motivation (Integrative and Instrumental) and English Proficiency among Iranian EFL Learners

The current study aims at investigating the relationship between the learners- integrative and instrumental motivation and English proficiency among Iranian EFL learners. The participants in this study consisted of 128 undergraduate university students including 64 males and 64 females, majoring in English as a foreign language, from Shiraz Azad University. Two research instruments were used to gather the needed data for this study: 1) Language Proficiency Test. 2) A scale on motivation which determines the type of the EFL learners- motivation. Correlatin coefficient and t-test were used to analyze the collected data and the main result was found as follows: There is a significant relationship between the integrative motivation and instrumental motivation with English proficiency among EFL learners of Shiraz Azad University.

Particular Qualities of Education in Kazakh Society

Most of the academics connect a theory of multiculturalism with globalization and limit it by last decades of 20th century. However, Kazakh society encountered with this problem when the Soviet-s rule emerged. As a result of repression, the Second World War, development of virgin lands representatives of more than 100 nationalities lives in Kazakhstan. Communist ideology propagandized internationalism, which would defined principles of multicultural community but a common ideology demands a single culture. As a result multicultural society in the USSR developed under control of Russian culture. Education in the USSR was conducted in two departments: autochthonous and Russian. Autochthonous education narrowed student capabilities. Also because of soviet ideology science was conducted in Russian Universities provided education in Russian and all science literature were in Russian. Exceptions were humanitarian fields where Kazakh departments were admitted. Naturally non-Kazakhs studied in Russian departments, moreover Kazakhs preferred to study in Russian as most do nowadays preferring English. As a result Kazakh society consisted of Kazakhs, Kazakhs who recognized Russian as a mother tongue and other nationalities who were also Russian speakers. This aspect continues to distinguish particular qualities of multicultural community in Kazakhstan.