Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines

This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. A simple machine learning approach is used that eliminates prior language knowledge such as part-of-speech or noun phrase tagging thereby allowing for its applicability across languages. No domain-specific knowledge is included. The accuracy measures achieved are comparable to those obtained using more complex approaches, which constitutes a motivation to investigate ways to improve the scalability of multiclass SVM in order to make the solution more practical and useable. Improving training time of multi-class SVM would make support vector machines a more viable and practical machine learning solution for real-world problems with large datasets. An initial prototype results in great improvement of the training time at the expense of memory requirements.

Regression Test Selection Technique for Multi-Programming Language

Regression testing is a maintenance activity applied to modified software to provide confidence that the changed parts are correct and that the unchanged parts have not been adversely affected by the modifications. Regression test selection techniques reduce the cost of regression testing, by selecting a subset of an existing test suite to use in retesting modified programs. This paper presents the first general regression-test-selection technique, which based on code and allows selecting test cases for any programs written in any programming language. Then it handles incomplete program. We also describe RTSDiff, a regression-test-selection system that implements the proposed technique. The results of the empirical studied that performed in four programming languages java, C#, Cµ and Visual basic show that the efficiency and effective in reducing the size of test suit.

EASEL: Evaluation of Algorithmic Skills in an Environment Learning

This paper attempts to explore a new method to improve the teaching of algorithmic for beginners. It is well known that algorithmic is a difficult field to teach for teacher and complex to assimilate for learner. These difficulties are due to intrinsic characteristics of this field and to the manner that teachers (the majority) apprehend its bases. However, in a Technology Enhanced Learning environment (TEL), assessment, which is important and indispensable, is the most delicate phase to implement, for all problems that generate (noise...). Our objective registers in the confluence of these two axes. For this purpose, EASEL focused essentially to elaborate an assessment approach of algorithmic competences in a TEL environment. This approach consists in modeling an algorithmic solution according to basic and elementary operations which let learner draw his/her own step with all autonomy and independently to any programming language. This approach assures a trilateral assessment: summative, formative and diagnostic assessment.

Enabling Integration across Heterogeneous Care Networks

The paper shows how the CASMAS modeling language, and its associated pervasive computing architecture, can be used to facilitate continuity of care by providing members of patientcentered communities of care with a support to cooperation and knowledge sharing through the usage of electronic documents and digital devices. We consider a scenario of clearly fragmented care to show how proper mechanisms can be defined to facilitate a better integration of practices and information across heterogeneous care networks. The scenario is declined in terms of architectural components and cooperation-oriented mechanisms that make the support reactive to the evolution of the context where these communities operate.

Comparison among Various Question Generations for Decision Tree Based State Tying in Persian Language

Performance of any continuous speech recognition system is highly dependent on performance of the acoustic models. Generally, development of the robust spoken language technology relies on the availability of large amounts of data. Common way to cope with little data for training each state of Markov models is treebased state tying. This tying method applies contextual questions to tie states. Manual procedure for question generation suffers from human errors and is time consuming. Various automatically generated questions are used to construct decision tree. There are three approaches to generate questions to construct HMMs based on decision tree. One approach is based on misrecognized phonemes, another approach basically uses feature table and the other is based on state distributions corresponding to context-independent subword units. In this paper, all these methods of automatic question generation are applied to the decision tree on FARSDAT corpus in Persian language and their results are compared with those of manually generated questions. The results show that automatically generated questions yield much better results and can replace manually generated questions in Persian language.

Cosastudio: A Software Architecture Modeling Tool

A key aspect of the design of any software system is its architecture. An architecture description provides a formal model of the architecture in terms of components and connectors and how they are composed together. COSA (Component-Object based Software Structures), is based on object-oriented modeling and component-based modeling. The model improves the reusability by increasing extensibility, evolvability, and compositionality of the software systems. This paper presents the COSA modelling tool which help architects the possibility to verify the structural coherence of a given system and to validate its semantics with COSA approach.

Skew Detection Technique for Binary Document Images based on Hough Transform

Document image processing has become an increasingly important technology in the automation of office documentation tasks. During document scanning, skew is inevitably introduced into the incoming document image. Since the algorithm for layout analysis and character recognition are generally very sensitive to the page skew. Hence, skew detection and correction in document images are the critical steps before layout analysis. In this paper, a novel skew detection method is presented for binary document images. The method considered the some selected characters of the text which may be subjected to thinning and Hough transform to estimate skew angle accurately. Several experiments have been conducted on various types of documents such as documents containing English Documents, Journals, Text-Book, Different Languages and Document with different fonts, Documents with different resolutions, to reveal the robustness of the proposed method. The experimental results revealed that the proposed method is accurate compared to the results of well-known existing methods.

On Pattern-Based Programming towards the Discovery of Frequent Patterns

The problem of frequent pattern discovery is defined as the process of searching for patterns such as sets of features or items that appear in data frequently. Finding such frequent patterns has become an important data mining task because it reveals associations, correlations, and many other interesting relationships hidden in a database. Most of the proposed frequent pattern mining algorithms have been implemented with imperative programming languages. Such paradigm is inefficient when set of patterns is large and the frequent pattern is long. We suggest a high-level declarative style of programming apply to the problem of frequent pattern discovery. We consider two languages: Haskell and Prolog. Our intuitive idea is that the problem of finding frequent patterns should be efficiently and concisely implemented via a declarative paradigm since pattern matching is a fundamental feature supported by most functional languages and Prolog. Our frequent pattern mining implementation using the Haskell and Prolog languages confirms our hypothesis about conciseness of the program. The comparative performance studies on line-of-code, speed and memory usage of declarative versus imperative programming have been reported in the paper.

Generator of Hypotheses an Approach of Data Mining Based on Monotone Systems Theory

Generator of hypotheses is a new method for data mining. It makes possible to classify the source data automatically and produces a particular enumeration of patterns. Pattern is an expression (in a certain language) describing facts in a subset of facts. The goal is to describe the source data via patterns and/or IF...THEN rules. Used evaluation criteria are deterministic (not probabilistic). The search results are trees - form that is easy to comprehend and interpret. Generator of hypotheses uses very effective algorithm based on the theory of monotone systems (MS) named MONSA (MONotone System Algorithm).

An Approach of Quantum Steganography through Special SSCE Code

Encrypted messages sending frequently draws the attention of third parties, perhaps causing attempts to break and reveal the original messages. Steganography is introduced to hide the existence of the communication by concealing a secret message in an appropriate carrier like text, image, audio or video. Quantum steganography where the sender (Alice) embeds her steganographic information into the cover and sends it to the receiver (Bob) over a communication channel. Alice and Bob share an algorithm and hide quantum information in the cover. An eavesdropper (Eve) without access to the algorithm can-t find out the existence of the quantum message. In this paper, a text quantum steganography technique based on the use of indefinite articles (a) or (an) in conjunction with the nonspecific or non-particular nouns in English language and quantum gate truth table have been proposed. The authors also introduced a new code representation technique (SSCE - Secret Steganography Code for Embedding) at both ends in order to achieve high level of security. Before the embedding operation each character of the secret message has been converted to SSCE Value and then embeds to cover text. Finally stego text is formed and transmits to the receiver side. At the receiver side different reverse operation has been carried out to get back the original information.

Towards an Extended SQLf: Bipolar Query Language with Preferences

Database management systems that integrate user preferences promise better solution for personalization, greater flexibility and higher quality of query responses. This paper presents a tentative work that studies and investigates approaches to express user preferences in queries. We sketch an extend capabilities of SQLf language that uses the fuzzy set theory in order to define the user preferences. For that, two essential points are considered: the first concerns the expression of user preferences in SQLf by so-called fuzzy commensurable predicates set. The second concerns the bipolar way in which these user preferences are expressed on mandatory and/or optional preferences.

A Mark-Up Approach to Add Value

This paper presents a mark-up approach to service creation in Next Generation Networks. The approach allows deriving added value from network functions exposed by Parlay/OSA (Open Service Access) interfaces. With OSA interfaces service logic scripts might be executed both on callrelated and call-unrelated events. To illustrate the approach XMLbased language constructions for data and method definitions, flow control, time measuring and supervision and database access are given and an example of OSA application is considered.

3D Rendering of American Sign Language Finger-Spelling: A Comparative Study of Two Animation Techniques

In this paper we report a study aimed at determining the most effective animation technique for representing ASL (American Sign Language) finger-spelling. Specifically, in the study we compare two commonly used 3D computer animation methods (keyframe animation and motion capture) in order to ascertain which technique produces the most 'accurate', 'readable', and 'close to actual signing' (i.e. realistic) rendering of ASL finger-spelling. To accomplish this goal we have developed 20 animated clips of fingerspelled words and we have designed an experiment consisting of a web survey with rating questions. 71 subjects ages 19-45 participated in the study. Results showed that recognition of the words was correlated with the method used to animate the signs. In particular, keyframe technique produced the most accurate representation of the signs (i.e., participants were more likely to identify the words correctly in keyframed sequences rather than in motion captured ones). Further, findings showed that the animation method had an effect on the reported scores for readability and closeness to actual signing; the estimated marginal mean readability and closeness was greater for keyframed signs than for motion captured signs. To our knowledge, this is the first study aimed at measuring and comparing accuracy, readability and realism of ASL animations produced with different techniques.

Lattice Boltzmann Simulation of Binary Mixture Diffusion Using Modern Graphics Processors

A highly optimized implementation of binary mixture diffusion with no initial bulk velocity on graphics processors is presented. The lattice Boltzmann model is employed for simulating the binary diffusion of oxygen and nitrogen into each other with different initial concentration distributions. Simulations have been performed using the latest proposed lattice Boltzmann model that satisfies both the indifferentiability principle and the H-theorem for multi-component gas mixtures. Contemporary numerical optimization techniques such as memory alignment and increasing the multiprocessor occupancy are exploited along with some novel optimization strategies to enhance the computational performance on graphics processors using the C for CUDA programming language. Speedup of more than two orders of magnitude over single-core processors is achieved on a variety of Graphical Processing Unit (GPU) devices ranging from conventional graphics cards to advanced, high-end GPUs, while the numerical results are in excellent agreement with the available analytical and numerical data in the literature.

N-Grams: A Tool for Repairing Word Order Errors in Ill-formed Texts

This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. A possible way for reordering the words is to use all the permutations. The problem is that for a sentence with length N words the number of all permutations is N!. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The confusion matrix technique has been designed in order to reduce the search space among permuted sentences. The limitation of search space is succeeded using the statistical inference of N-grams. The results of this technique are very interesting and prove that the number of permuted sentences can be reduced by 98,16%. For experimental purposes a test set of TOEFL sentences was used and the results show that more than 95% can be repaired using the proposed method.

Echo State Networks for Arabic Phoneme Recognition

This paper presents an ESN-based Arabic phoneme recognition system trained with supervised, forced and combined supervised/forced supervised learning algorithms. Mel-Frequency Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC) techniques are used and compared as the input feature extraction technique. The system is evaluated using 6 speakers from the King Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia dialectic and 34 speakers from the Center for Spoken Language Understanding (CSLU2002) database of speakers with different dialectics from 12 Arabic countries. Results for the KAPD and CSLU2002 Arabic databases show phoneme recognition performances of 72.31% and 38.20% respectively.

Model Transformation with a Visual Control Flow Language

Graph rewriting-based visual model processing is a widely used technique for model transformation. Visual model transformations often need to follow an algorithm that requires a strict control over the execution sequence of the transformation steps. Therefore, in Visual Model Processors (VMPs) the execution order of the transformation steps is crucial. This paper presents the visual control flow support of Visual Modeling and Transformation System (VMTS), which facilitates composing complex model transformations of simple transformation steps and executing them. The VMTS Visual Control Flow Language (VCFL) uses stereotyped activity diagrams to specify control flow structures and OCL constraints to choose between different control flow branches. This paper introduces VCFL, discusses its termination properties and provides an algorithm to support the termination analysis of VCFL transformations.

A Black-box Approach for Response Quality Evaluation of Conversational Agent Systems

The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more domain specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time to achieve high quality responses. This paper discusses the inappropriateness of the existing measures for response quality evaluation and the call for new standard measures and related considerations are brought forward. As a short-term solution for evaluating response quality of conversational agents, and to demonstrate the challenges in evaluating systems of different nature, this research proposes a blackbox approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems, AnswerBus, START and AINI.

Himmapan Creatures: The Tactile Texture Designed for the Blind

The main purpose of this research aimed to create tactile texture designed media for the blind used for extra learning outside classrooms in order to enhance imagination of the blind about Himmapan creatures, furthermore, the main objective of the research focused on improving the visual disabled perception to be equal to normal people. The target group of the research is blinded students studying in The Bangkok school for the blind between grade 4-6 in the second semester of 2011 who are able to read the braille language. The research methodology consisted of the field study and the documentary study related to the blind, tactile texture designed media and Himmapan creatures. 10 pictures of tactile texture designed media were created in the designing process which began after the analysis had conducted based the primary and secondary data. The works had presented to experts in the visual disabled field who evaluated the works. After approval, the works used as prototype to teach the blind. KeywordsBlind, Himmapan Creatures, Tactile Texture.

English Language Learning Strategies Used by University Students: A Case Study of English and Business English Major at Suan Sunandha Rajabhat in Bangkok

The purposes of this research are 1) to study English language learning strategies used by the fourth-year students majoring in English and Business English, 2) to study the English language learning strategies which have an affect on English learning achievement, and 3) to compare the English language learning strategies used by the students majoring in English and Business English. The population and sampling comprise of 139 university students of the Suan Sunandha Rajabhat University. Research instruments are language learning strategies questionnaire which was constructed by the researcher and improved on by three experts and the transcripts that show the results of English learning achievement. The questionnaire includes 1) Language Practice Strategy 2)Memory Strategy 3) Communication Strategy 4)Making an Intelligent Guess or Compensation Strategy 5) Self-discipline in Learning Management Strategy 6) Affective Strategy 7)Self-Monitoring Strategy 8) Self-studySkill Strategy. Statistics used in the study are mean, standard deviation, T-test and One Way ANOVA, Pearson product moment correlation coefficient and Regression Analysis. The results of the findings reveal that the English language learning strategies most frequently used by the students are affective strategy, making an intelligent guess or compensation strategy, self-studyskill strategy and self-monitoring strategy respectively. The aspect of making an intelligent guess or compensation strategy had the most significant affect on English learning achievement. It is found that the English language learning strategies mostly used by the Business English major students and moderately used by the English major students. Their language practice strategies uses were significantly different at the 0.05 level and their communication strategies uses were significantly different at the 0.01 level. In addition, it is found that the poor students and the fair ones most frequently used affective strategy while the good ones most frequently used making an intelligent guess or compensation strategy. KeywordsEnglish language, language learning strategies, English learning achievement, and students majoring in English, Business English. Pranee Pathomchaiwat is an Assistant Professor in Business English Program, Suan Sunandha Rajabhat University, Bangkok, Thailand (e-mail: [email protected]).