Abstract: This paper explores the scalability issues associated
with solving the Named Entity Recognition (NER) problem using
Support Vector Machines (SVM) and high-dimensional features. The
performance results of a set of experiments conducted using binary
and multi-class SVM with increasing training data sizes are
examined. The NER domain chosen for these experiments is the
biomedical publications domain, especially selected due to its
importance and inherent challenges. A simple machine learning
approach is used that eliminates prior language knowledge such as
part-of-speech or noun phrase tagging thereby allowing for its
applicability across languages. No domain-specific knowledge is
included. The accuracy measures achieved are comparable to those
obtained using more complex approaches, which constitutes a
motivation to investigate ways to improve the scalability of multiclass
SVM in order to make the solution more practical and useable.
Improving training time of multi-class SVM would make support
vector machines a more viable and practical machine learning
solution for real-world problems with large datasets. An initial
prototype results in great improvement of the training time at the
expense of memory requirements.
Abstract: Regression testing is a maintenance activity applied to
modified software to provide confidence that the changed parts are
correct and that the unchanged parts have not been adversely affected
by the modifications. Regression test selection techniques reduce the
cost of regression testing, by selecting a subset of an existing test
suite to use in retesting modified programs. This paper presents the
first general regression-test-selection technique, which based on code
and allows selecting test cases for any programs written in any
programming language. Then it handles incomplete program. We
also describe RTSDiff, a regression-test-selection system that
implements the proposed technique. The results of the empirical
studied that performed in four programming languages java, C#, Cµ
and Visual basic show that the efficiency and effective in reducing
the size of test suit.
Abstract: This paper attempts to explore a new method to
improve the teaching of algorithmic for beginners. It is well known
that algorithmic is a difficult field to teach for teacher and complex to
assimilate for learner. These difficulties are due to intrinsic
characteristics of this field and to the manner that teachers (the
majority) apprehend its bases. However, in a Technology Enhanced
Learning environment (TEL), assessment, which is important and
indispensable, is the most delicate phase to implement, for all
problems that generate (noise...). Our objective registers in the
confluence of these two axes. For this purpose, EASEL focused
essentially to elaborate an assessment approach of algorithmic
competences in a TEL environment. This approach consists in
modeling an algorithmic solution according to basic and elementary
operations which let learner draw his/her own step with all autonomy
and independently to any programming language. This approach
assures a trilateral assessment: summative, formative and diagnostic
assessment.
Abstract: The paper shows how the CASMAS modeling language,
and its associated pervasive computing architecture, can be
used to facilitate continuity of care by providing members of patientcentered
communities of care with a support to cooperation and
knowledge sharing through the usage of electronic documents and
digital devices. We consider a scenario of clearly fragmented care to
show how proper mechanisms can be defined to facilitate a better
integration of practices and information across heterogeneous care
networks. The scenario is declined in terms of architectural components
and cooperation-oriented mechanisms that make the support
reactive to the evolution of the context where these communities
operate.
Abstract: Performance of any continuous speech recognition system is highly dependent on performance of the acoustic models. Generally, development of the robust spoken language technology relies on the availability of large amounts of data. Common way to cope with little data for training each state of Markov models is treebased state tying. This tying method applies contextual questions to tie states. Manual procedure for question generation suffers from human errors and is time consuming. Various automatically generated questions are used to construct decision tree. There are three approaches to generate questions to construct HMMs based on decision tree. One approach is based on misrecognized phonemes, another approach basically uses feature table and the other is based on state distributions corresponding to context-independent subword units. In this paper, all these methods of automatic question generation are applied to the decision tree on FARSDAT corpus in Persian language and their results are compared with those of manually generated questions. The results show that automatically generated questions yield much better results and can replace manually generated questions in Persian language.
Abstract: A key aspect of the design of any software system is
its architecture. An architecture description provides a formal model
of the architecture in terms of components and connectors and how
they are composed together. COSA (Component-Object based
Software Structures), is based on object-oriented modeling and
component-based modeling. The model improves the reusability by
increasing extensibility, evolvability, and compositionality of the
software systems. This paper presents the COSA modelling tool
which help architects the possibility to verify the structural coherence
of a given system and to validate its semantics with COSA approach.
Abstract: Document image processing has become an
increasingly important technology in the automation of office
documentation tasks. During document scanning, skew is inevitably
introduced into the incoming document image. Since the algorithm
for layout analysis and character recognition are generally very
sensitive to the page skew. Hence, skew detection and correction in
document images are the critical steps before layout analysis. In this
paper, a novel skew detection method is presented for binary
document images. The method considered the some selected
characters of the text which may be subjected to thinning and Hough
transform to estimate skew angle accurately. Several experiments
have been conducted on various types of documents such as
documents containing English Documents, Journals, Text-Book,
Different Languages and Document with different fonts, Documents
with different resolutions, to reveal the robustness of the proposed
method. The experimental results revealed that the proposed method
is accurate compared to the results of well-known existing methods.
Abstract: The problem of frequent pattern discovery is defined
as the process of searching for patterns such as sets of features or items that appear in data frequently. Finding such frequent patterns
has become an important data mining task because it reveals associations, correlations, and many other interesting relationships
hidden in a database. Most of the proposed frequent pattern mining
algorithms have been implemented with imperative programming
languages. Such paradigm is inefficient when set of patterns is large
and the frequent pattern is long. We suggest a high-level declarative
style of programming apply to the problem of frequent pattern
discovery. We consider two languages: Haskell and Prolog. Our
intuitive idea is that the problem of finding frequent patterns should
be efficiently and concisely implemented via a declarative paradigm
since pattern matching is a fundamental feature supported by most
functional languages and Prolog. Our frequent pattern mining
implementation using the Haskell and Prolog languages confirms our
hypothesis about conciseness of the program. The comparative
performance studies on line-of-code, speed and memory usage of
declarative versus imperative programming have been reported in the
paper.
Abstract: Generator of hypotheses is a new method for data mining. It makes possible to classify the source data automatically and produces a particular enumeration of patterns. Pattern is an expression (in a certain language) describing facts in a subset of facts. The goal is to describe the source data via patterns and/or IF...THEN rules. Used evaluation criteria are deterministic (not probabilistic). The search results are trees - form that is easy to comprehend and interpret. Generator of hypotheses uses very effective algorithm based on the theory of monotone systems (MS) named MONSA (MONotone System Algorithm).
Abstract: Encrypted messages sending frequently draws the attention
of third parties, perhaps causing attempts to break and
reveal the original messages. Steganography is introduced to hide
the existence of the communication by concealing a secret message
in an appropriate carrier like text, image, audio or video. Quantum
steganography where the sender (Alice) embeds her steganographic
information into the cover and sends it to the receiver (Bob) over a
communication channel. Alice and Bob share an algorithm and hide
quantum information in the cover. An eavesdropper (Eve) without
access to the algorithm can-t find out the existence of the quantum
message. In this paper, a text quantum steganography technique based
on the use of indefinite articles (a) or (an) in conjunction with the nonspecific
or non-particular nouns in English language and quantum
gate truth table have been proposed. The authors also introduced a
new code representation technique (SSCE - Secret Steganography
Code for Embedding) at both ends in order to achieve high level of
security. Before the embedding operation each character of the secret
message has been converted to SSCE Value and then embeds to cover
text. Finally stego text is formed and transmits to the receiver side.
At the receiver side different reverse operation has been carried out
to get back the original information.
Abstract: Database management systems that integrate user preferences promise better solution for personalization, greater flexibility and higher quality of query responses. This paper presents a tentative work that studies and investigates approaches to express user preferences in queries. We sketch an extend capabilities of SQLf language that uses the fuzzy set theory in order to define the user preferences. For that, two essential points are considered: the first concerns the expression of user preferences in SQLf by so-called fuzzy commensurable predicates set. The second concerns the bipolar way in which these user preferences are expressed on mandatory and/or optional preferences.
Abstract: This paper presents a mark-up approach to service creation in Next Generation Networks. The approach allows deriving added value from network functions exposed by Parlay/OSA (Open Service Access) interfaces. With OSA interfaces service logic scripts might be executed both on callrelated and call-unrelated events. To illustrate the approach XMLbased language constructions for data and method definitions, flow control, time measuring and supervision and database access are given and an example of OSA application is considered.
Abstract: In this paper we report a study aimed at determining
the most effective animation technique for representing ASL
(American Sign Language) finger-spelling. Specifically, in the study
we compare two commonly used 3D computer animation methods
(keyframe animation and motion capture) in order to ascertain which
technique produces the most 'accurate', 'readable', and 'close to
actual signing' (i.e. realistic) rendering of ASL finger-spelling. To
accomplish this goal we have developed 20 animated clips of fingerspelled
words and we have designed an experiment consisting of a
web survey with rating questions. 71 subjects ages 19-45 participated
in the study. Results showed that recognition of the words was
correlated with the method used to animate the signs. In particular,
keyframe technique produced the most accurate representation of the
signs (i.e., participants were more likely to identify the words
correctly in keyframed sequences rather than in motion captured
ones). Further, findings showed that the animation method had an
effect on the reported scores for readability and closeness to actual
signing; the estimated marginal mean readability and closeness was
greater for keyframed signs than for motion captured signs. To our
knowledge, this is the first study aimed at measuring and comparing
accuracy, readability and realism of ASL animations produced with
different techniques.
Abstract: A highly optimized implementation of binary mixture
diffusion with no initial bulk velocity on graphics processors is
presented. The lattice Boltzmann model is employed for simulating
the binary diffusion of oxygen and nitrogen into each other with
different initial concentration distributions. Simulations have been
performed using the latest proposed lattice Boltzmann model that
satisfies both the indifferentiability principle and the H-theorem for
multi-component gas mixtures. Contemporary numerical
optimization techniques such as memory alignment and increasing
the multiprocessor occupancy are exploited along with some novel
optimization strategies to enhance the computational performance on
graphics processors using the C for CUDA programming language.
Speedup of more than two orders of magnitude over single-core
processors is achieved on a variety of Graphical Processing Unit
(GPU) devices ranging from conventional graphics cards to
advanced, high-end GPUs, while the numerical results are in
excellent agreement with the available analytical and numerical data
in the literature.
Abstract: This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. A possible way for reordering the words is to use all the permutations. The problem is that for a sentence with length N words the number of all permutations is N!. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The confusion matrix technique has been designed in order to reduce the search space among permuted sentences. The limitation of search space is succeeded using the statistical inference of N-grams. The results of this technique are very interesting and prove that the number of permuted sentences can be reduced by 98,16%. For experimental purposes a test set of TOEFL sentences was used and the results show that more than 95% can be repaired using the proposed method.
Abstract: This paper presents an ESN-based Arabic phoneme
recognition system trained with supervised, forced and combined
supervised/forced supervised learning algorithms. Mel-Frequency
Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC)
techniques are used and compared as the input feature extraction
technique. The system is evaluated using 6 speakers from the King
Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia
dialectic and 34 speakers from the Center for Spoken Language
Understanding (CSLU2002) database of speakers with different
dialectics from 12 Arabic countries. Results for the KAPD and
CSLU2002 Arabic databases show phoneme recognition
performances of 72.31% and 38.20% respectively.
Abstract: Graph rewriting-based visual model processing is a
widely used technique for model transformation. Visual model
transformations often need to follow an algorithm that requires a
strict control over the execution sequence of the transformation steps.
Therefore, in Visual Model Processors (VMPs) the execution order
of the transformation steps is crucial. This paper presents the visual
control flow support of Visual Modeling and Transformation System
(VMTS), which facilitates composing complex model
transformations of simple transformation steps and executing them.
The VMTS Visual Control Flow Language (VCFL) uses stereotyped
activity diagrams to specify control flow structures and OCL
constraints to choose between different control flow branches. This
paper introduces VCFL, discusses its termination properties and
provides an algorithm to support the termination analysis of VCFL
transformations.
Abstract: The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more domain specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time to achieve high quality responses. This paper discusses the inappropriateness of the existing measures for response quality evaluation and the call for new standard measures and related considerations are brought forward. As a short-term solution for evaluating response quality of conversational agents, and to demonstrate the challenges in evaluating systems of different nature, this research proposes a blackbox approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems, AnswerBus, START and AINI.
Abstract: The main purpose of this research aimed to create tactile texture designed media for the blind used for extra learning outside classrooms in order to enhance imagination of the blind about Himmapan creatures, furthermore, the main objective of the research focused on improving the visual disabled perception to be equal to normal people. The target group of the research is blinded students studying in The Bangkok school for the blind between grade 4-6 in the second semester of 2011 who are able to read the braille language. The research methodology consisted of the field study and the documentary study related to the blind, tactile texture designed media and Himmapan creatures. 10 pictures of tactile texture designed media were created in the designing process which began after the analysis had conducted based the primary and secondary data. The works had presented to experts in the visual disabled field who evaluated the works. After approval, the works used as prototype to teach the blind. KeywordsBlind, Himmapan Creatures, Tactile Texture.
Abstract: The purposes of this research are 1) to study English language learning strategies used by the fourth-year students majoring in English and Business English, 2) to study the English language learning strategies which have an affect on English learning achievement, and 3) to compare the English language learning strategies used by the students majoring in English and Business English. The population and sampling comprise of 139 university students of the Suan Sunandha Rajabhat University. Research instruments are language learning strategies questionnaire which was constructed by the researcher and improved on by three experts and the transcripts that show the results of English learning achievement. The questionnaire includes 1) Language Practice Strategy 2)Memory Strategy 3) Communication Strategy 4)Making an Intelligent Guess or Compensation Strategy 5) Self-discipline in Learning Management Strategy 6) Affective Strategy 7)Self-Monitoring Strategy 8) Self-studySkill Strategy. Statistics used in the study are mean, standard deviation, T-test and One Way ANOVA, Pearson product moment correlation coefficient and Regression Analysis. The results of the findings reveal that the English language learning strategies most frequently used by the students are affective strategy, making an intelligent guess or compensation strategy, self-studyskill strategy and self-monitoring strategy respectively. The aspect of making an intelligent guess or compensation strategy had the most significant affect on English learning achievement. It is found that the English language learning strategies mostly used by the Business English major students and moderately used by the English major students. Their language practice strategies uses were significantly different at the 0.05 level and their communication strategies uses were significantly different at the 0.01 level. In addition, it is found that the poor students and the fair ones most frequently used affective strategy while the good ones most frequently used making an intelligent guess or compensation strategy. KeywordsEnglish language, language learning strategies, English learning achievement, and students majoring in English, Business English. Pranee Pathomchaiwat is an Assistant Professor in Business English Program, Suan Sunandha Rajabhat University, Bangkok, Thailand (e-mail: [email protected]).