Abstract: One of the major goals of Spoken Dialog Systems
(SDS) is to understand what the user utters.
In the SDS domain, the Spoken Language Understanding (SLU)
Module classifies user utterances by means of a pre-definite
conceptual knowledge. The SLU module is able to recognize only the
meaning previously included in its knowledge base. Due the vastity
of that knowledge, the information storing is a very expensive
process.
Updating and managing the knowledge base are time-consuming
and error-prone processes because of the rapidly growing number of
entities like proper nouns and domain-specific nouns. This paper
proposes a solution to the problem of Name Entity Recognition
(NER) applied to a SDS domain. The proposed solution attempts to
automatically recognize the meaning associated with an utterance by
using the PANKOW (Pattern based Annotation through Knowledge
On the Web) method at runtime.
The method being proposed extracts information from the Web to
increase the SLU knowledge module and reduces the development
effort. In particular, the Google Search Engine is used to extract
information from the Facebook social network.
Abstract: In the process of information transmission (concept verbalization) we deal mostly with the substance (contents), and then pay attention to the form. Recalling events from the remote past, often we cannot exactly reproduce specific heard or pronounced words, as well as the syntactic structures. We remember events, feelings, images; we recall the general contents of the discourse. The thought gets a specific language form only during the concept verbalization phase. With minimum time for pondering, depending on the language competence level, the grammar and syntactic shaping often occurs automatically with the use of famous models and stereotypes. This means that the language form adapts itself to the consciousness, and not vice versa.
Abstract: Thailand has evolved many unique culture and knowledge, and the leading is the Thai traditional medicine (TTM). Recently, a number of researchers have tried to save this indigenous knowledge. However, the system to do so has still been scant. To preserve this ancient knowledge, we therefore invented and integrated multi-linguistic techniques to create the system of the collected all of recipes. This application extracted the medical recipes from antique scriptures then normalized antiquarian words, primitive grammar and antiquated measurement of them to the modern ones. Then, we applied ingredient-duplication-calculation, proportion-similarity-calculation and score-ranking to examine duplicate recipes. We collected the questionnaires from registrants and people to investigate the users’ satisfaction. The satisfactory results were found. This application assists not only registrants to validating the copyright violation in TTM registration process but also people to cure their illness that aids both Thai people and all mankind to fight for intractable diseases.
Abstract: Natural language processing systems pose a unique
challenge for software architectural design as system complexity has
increased continually and systems cannot be easily constructed from
loosely coupled modules. Lexical, syntactic, semantic, and pragmatic
aspects of linguistic information are tightly coupled in a manner that
requires separation of concerns in a special way in design,
implementation and maintenance. An aspect oriented software
architecture is proposed in this paper after critically reviewing
relevant architectural issues. For the purpose of this paper, the
syntactic aspect is characterized by an augmented context-free
grammar. The semantic aspect is composed of multiple perspectives
including denotational, operational, axiomatic and case frame
approaches. Case frame semantics matured in India from deep
thematic analysis. It is argued that lexical, syntactic, semantic and
pragmatic aspects work together in a mutually dependent way and
their synergy is best represented in the aspect oriented approach. The
software architecture is presented with an augmented Unified
Modeling Language.
Abstract: In this paper, we propose a new method to describe fractal shapes using parametric l-systems. First we introduce scaling factors in the production rules of the parametric l-systems grammars. Then we decorticate these grammars with scaling factors using turtle algebra to show the mathematical relation between l-systems and iterated function systems (IFS). We demonstrate that with specific values of the scaling factors, we find the exact relationship established by Prusinkiewicz and Hammel between l-systems and IFS.
Abstract: Machine Translation (MT) between the Thai and English languages has been a challenging research topic in natural language processing. Most research has been done on English to Thai machine translation, but not the other way around. This paper presents a Thai to English Machine Translation System that translates a Thai sentence into interlingua of a Thai LFG tree using LFG grammar and a bottom up parser. The Thai LFG tree is then transformed into the corresponding English LFG tree by pattern matching and node transformation. Finally, an equivalent English sentence is created using structural information prescribed by the English LFG tree. Based on results of experiments designed to evaluate the performance of the proposed system, it can be stated that the system has been proven to be effective in providing a useful translation from Thai to English.
Abstract: In this paper, we present symbolic recognition models to extract knowledge characterized by document structures. Focussing on the extraction and the meticulous exploitation of the semantic structure of documents, we obtain a meaningful contextual tagging corresponding to different unit types (title, chapter, section, enumeration, etc.).
Abstract: In syntactic pattern recognition a pattern can be
represented by a graph. Given an unknown pattern represented by
a graph g, the problem of recognition is to determine if the graph g
belongs to a language L(G) generated by a graph grammar G. The
so-called IE graphs have been defined in [1] for a description of
patterns. The IE graphs are generated by so-called ETPL(k) graph
grammars defined in [1]. An efficient, parsing algorithm for ETPL(k)
graph grammars for syntactic recognition of patterns represented by
IE graphs has been presented in [1]. In practice, structural
descriptions may contain pattern distortions, so that the assignment
of a graph g, representing an unknown pattern, to
a graph language L(G) generated by an ETPL(k) graph grammar G is
rejected by the ETPL(k) type parsing. Therefore, there is a need for
constructing effective parsing algorithms for recognition of distorted
patterns. The purpose of this paper is to present a new approach to
syntactic recognition of distorted patterns. To take into account all
variations of a distorted pattern under study, a probabilistic
description of the pattern is needed. A random IE graph approach is
proposed here for such a description ([2]).
Abstract: Modeling of the distributed systems allows us to
represent the whole its functionality. The working system instance
rarely fulfils the whole functionality represented by model; usually
some parts of this functionality should be accessible periodically.
The reporting system based on the Data Warehouse concept seams to
be an intuitive example of the system that some of its functionality is
required only from time to time. Analyzing an enterprise risk
associated with the periodical change of the system functionality, we
should consider not only the inaccessibility of the components
(object) but also their functions (methods), and the impact of such a
situation on the system functionality from the business point of view.
In the paper we suggest that the risk attributes should be estimated
from risk attributes specified at the requirements level (Use Case in
the UML model) on the base of the information about the structure of
the model (presented at other levels of the UML model). We argue
that it is desirable to consider the influence of periodical changes in
requirements on the enterprise risk estimation. Finally, the
proposition of such a solution basing on the UML system model is
presented.
Abstract: The paper focuses on the area of context modeling with respect to the specification of context-aware systems supporting ubiquitous applications. The proposed approach, followed within the SIMPLICITY IST project, uses a high-level system ontology to derive context models for system components which consequently are mapped to the system's physical entities. For the definition of user and device-related context models in particular, the paper suggests a standard-based process consisting of an analysis phase using the Common Information Model (CIM) methodology followed by an implementation phase that defines 3GPP based components. The benefits of this approach are further depicted by preliminary examples of XML grammars defining profiles and components, component instances, coupled with descriptions of respective ubiquitous applications.
Abstract: Conventional approaches in the implementation of logic programming applications on embedded systems are solely of software nature. As a consequence, a compiler is needed that transforms the initial declarative logic program to its equivalent procedural one, to be programmed to the microprocessor. This approach increases the complexity of the final implementation and reduces the overall system's performance. On the contrary, presenting hardware implementations which are only capable of supporting logic programs prevents their use in applications where logic programs need to be intertwined with traditional procedural ones, for a specific application. We exploit HW/SW codesign methods to present a microprocessor, capable of supporting hybrid applications using both programming approaches. We take advantage of the close relationship between attribute grammar (AG) evaluation and knowledge engineering methods to present a programmable hardware parser that performs logic derivations and combine it with an extension of a conventional RISC microprocessor that performs the unification process to report the success or failure of those derivations. The extended RISC microprocessor is still capable of executing conventional procedural programs, thus hybrid applications can be implemented. The presented implementation is programmable, supports the execution of hybrid applications, increases the performance of logic derivations (experimental analysis yields an approximate 1000% increase in performance) and reduces the complexity of the final implemented code. The proposed hardware design is supported by a proposed extended C-language called C-AG.
Abstract: The Petri net tool INA is a well known tool by the
Petri net community. However, it lacks a graphical environment to
cerate and analyse INA models. Building a modelling tool for the
design and analysis from scratch (for INA tool for example) is
generally a prohibitive task. Meta-Modelling approach is useful to
deal with such problems since it allows the modelling of the
formalisms themselves. In this paper, we propose an approach based
on the combined use of Meta-modelling and Graph Grammars to
automatically generate a visual modelling tool for INA for analysis
purposes. In our approach, the UML Class diagram formalism is
used to define a meta-model of INA models. The meta-modelling
tool ATOM3 is used to generate a visual modelling tool according to
the proposed INA meta-model. We have also proposed a graph
grammar to automatically generate INA description of the
graphically specified Petri net models. This allows the user to avoid
the errors when this description is done manually. Then the INA tool
is used to perform the simulation and the analysis of the resulted INA
description. Our environment is illustrated through an example.
Abstract: Pattern matching based on regular tree grammars have been widely used in many areas of computer science. In this paper, we propose a pattern matcher within the framework of code generation, based on a generic and a formalized approach. According to this approach, parsers for regular tree grammars are adapted to a general pattern matching solution, rather than adapting the pattern matching according to their parsing behavior. Hence, we first formalize the construction of the pattern matches respective to input trees drawn from a regular tree grammar in a form of the so-called match trees. Then, we adopt a recently developed generic parser and tightly couple its parsing behavior with such construction. In addition to its generality, the resulting pattern matcher is characterized by its soundness and efficient implementation. This is demonstrated by the proposed theory and by the derived algorithms for its implementation. A comparison with similar and well-known approaches, such as the ones based on tree automata and LR parsers, has shown that our pattern matcher can be applied to a broader class of grammars, and achieves better approximation of pattern matches in one pass. Furthermore, its use as a machine code selector is characterized by a minimized overhead, due to the balanced distribution of the cost computations into static ones, during parser generation time, and into dynamic ones, during parsing time.
Abstract: Creating3D environments, including characters and
cities, is a significantly time consuming process due to a large amount
of workinvolved in designing and modelling.There have been a
number of attempts to automatically generate 3D objects employing
shape grammars. However it is still too early to apply the mechanism
to real problems such as real-time computer games.The purpose of this
research is to introduce a time efficient and cost effective method to
automatically generatevarious 3D objects for real-time 3D games.
This Shape grammar-based real-time City Generation (RCG) model is
a conceptual model for generating 3Denvironments in real-time and
can be applied to 3D gamesoranimations. The RCG system can
generate even a large cityby applying fundamental principles of shape
grammars to building elementsin various levels of detailin real-time.
Abstract: In this paper we propose a computational model for the representation and processing of morpho-phonological phenomena in a natural language, like Modern Greek. We aim at a unified treatment of inflection, compounding, and word-internal phonological changes, in a model that is used for both analysis and generation. After discussing certain difficulties cuase by well-known finitestate approaches, such as Koskenniemi-s two-level model [7] when applied to a computational treatment of compounding, we argue that a morphology-based model provides a more adequate account of word-internal phenomena. Contrary to the finite state approaches that cannot handle hierarchical word constituency in a satisfactory way, we propose a unification-based word grammar, as the nucleus of our strategy, which takes into consideration word representations that are based on affixation and [stem stem] or [stem word] compounds. In our formalism, feature-passing operations are formulated with the use of the unification device, and phonological rules modeling the correspondence between lexical and surface forms apply at morpheme boundaries. In the paper, examples from Modern Greek illustrate our approach. Morpheme structures, stress, and morphologically conditioned phoneme changes are analyzed and generated in a principled way.
Abstract: The purposes of this research are to study and develop
the algorithm of Thai spoonerism words by semi-automatic computer
programs, that is to say, in part of data input, syllables are already
separated and in part of spoonerism, the developed algorithm is
utilized, which can establish rules and mechanisms in Thai
spoonerism words for bi-syllables by utilizing analysis in elements of
the syllables, namely cluster consonant, vowel, intonation mark and
final consonant. From the study, it is found that bi-syllable Thai
spoonerism has 1 case of spoonerism mechanism, namely
transposition in value of vowel, intonation mark and consonant of
both 2 syllables but keeping consonant value and cluster word (if
any).
From the study, the rules and mechanisms in Thai spoonerism
word were applied to develop as Thai spoonerism word software,
utilizing PHP program. the software was brought to conduct a
performance test on software execution; it is found that the program
performs bi-syllable Thai spoonerism correctly or 99% of all words
used in the test and found faults on the program at 1% as the words
obtained from spoonerism may not be spelling in conformity with
Thai grammar and the answer in Thai spoonerism could be more than
1 answer.
Abstract: Due to the ever growing amount of publications about
protein-protein interactions, information extraction from text is
increasingly recognized as one of crucial technologies in
bioinformatics. This paper presents a Protein Interaction Extraction
System using a Link Grammar Parser from biomedical abstracts
(PIELG). PIELG uses linkage given by the Link Grammar Parser to
start a case based analysis of contents of various syntactic roles as
well as their linguistically significant and meaningful combinations.
The system uses phrasal-prepositional verbs patterns to overcome
preposition combinations problems. The recall and precision are
74.4% and 62.65%, respectively. Experimental evaluations with two
other state-of-the-art extraction systems indicate that PIELG system
achieves better performance. For further evaluation, the system is
augmented with a graphical package (Cytoscape) for extracting
protein interaction information from sequence databases. The result
shows that the performance is remarkably promising.
Abstract: Although achieving zero-defect software release is
practically impossible, software industries should take maximum
care to detect defects/bugs well ahead in time allowing only bare
minimums to creep into released version. This is a clear indicator of
time playing an important role in the bug detection. In addition to
this, software quality is the major factor in software engineering
process. Moreover, early detection can be achieved only through
static code analysis as opposed to conventional testing.
BugCatcher.Net is a static analysis tool, which detects bugs in .NET®
languages through MSIL (Microsoft Intermediate Language)
inspection. The tool utilizes a Parser based on Finite State Automata
to carry out bug detection. After being detected, bugs need to be
corrected immediately. BugCatcher.Net facilitates correction, by
proposing a corrective solution for reported warnings/bugs to end
users with minimum side effects. Moreover, the tool is also capable
of analyzing the bug trend of a program under inspection.
Abstract: Automatic reading of handwritten cheque is a computationally
complex process and it plays an important role in financial
risk management. Machine vision and learning provide a viable
solution to this problem. Research effort has mostly been focused
on recognizing diverse pitches of cheques and demand drafts with an
identical outline. However most of these methods employ templatematching
to localize the pitches and such schemes could potentially
fail when applied to different types of outline maintained by the
bank. In this paper, the so-called outline problem is resolved by
a cheque information tree (CIT), which generalizes the localizing
method to extract active-region-of-entities. In addition, the weight
based density plot (WBDP) is performed to isolate text entities and
read complete pitches. Recognition is based on texture features using
neural classifiers. Legal amount is subsequently recognized by both
texture and perceptual features. A post-processing phase is invoked
to detect the incorrect readings by Type-2 grammar using the Turing
machine. The performance of the proposed system was evaluated
using cheque and demand drafts of 22 different banks. The test data
consists of a collection of 1540 leafs obtained from 10 different
account holders from each bank. Results show that this approach
can easily be deployed without significant design amendments.
Abstract: There are multiple reasons to expect that detecting the
word order errors in a text will be a difficult problem, and detection
rates reported in the literature are in fact low. Although grammatical
rules constructed by computer linguists improve the performance of
grammar checker in word order diagnosis, the repairing task is still
very difficult. This paper presents an approach for repairing word
order errors in English text by reordering words in a sentence and
choosing the version that maximizes the number of trigram hits
according to a language model. The novelty of this method concerns
the use of an efficient confusion matrix technique for reordering the
words. The comparative advantage of this method is that works with
a large set of words, and avoids the laborious and costly process of
collecting word order errors for creating error patterns.