Abstract: Ontologies offer a means for representing and sharing
information in many domains, particularly in complex domains. For
example, it can be used for representing and sharing information
of System Requirement Specification (SRS) of complex systems
like the SRS of ERTMS/ETCS written in natural language. Since
this system is a real-time and critical system, generic ontologies,
such as OWL and generic ERTMS ontologies provide minimal
support for modeling temporal information omnipresent in these SRS
documents. To support the modeling of temporal information, one
of the challenges is to enable representation of dynamic features
evolving in time within a generic ontology with a minimal redesign
of it. The separation of temporal information from other information
can help to predict system runtime operation and to properly design
and implement them. In addition, it is helpful to provide a reasoning
and querying techniques to reason and query temporal information
represented in the ontology in order to detect potential temporal
inconsistencies. To address this challenge, we propose a lightweight
3-layer temporal Quality of Service (QoS) ontology for representing,
reasoning and querying over temporal and non-temporal information
in a complex domain ontology. Representing QoS entities in separated
layers can clarify the distinction between the non QoS entities
and the QoS entities in an ontology. The upper generic layer of
the proposed ontology provides an intuitive knowledge of domain
components, specially ERTMS/ETCS components. The separation of
the intermediate QoS layer from the lower QoS layer allows us to
focus on specific QoS Characteristics, such as temporal or integrity
characteristics. In this paper, we focus on temporal information that
can be used to predict system runtime operation. To evaluate our
approach, an example of the proposed domain ontology for handover
operation, as well as a reasoning rule over temporal relations in this
domain-specific ontology, are presented.
Abstract: Creating a database scheme is essentially a manual
process. From a requirement specification the information contained
within has to be analyzed and reduced into a set of tables, attributes
and relationships. This is a time consuming process that has to go
through several stages before an acceptable database schema is
achieved. The purpose of this paper is to implement a Natural
Language Processing (NLP) based tool to produce a relational
database from a requirement specification. The Stanford CoreNLP
version 3.3.1 and the Java programming were used to implement the
proposed model. The outcome of this study indicates that a first draft
of a relational database schema can be extracted from a requirement
specification by using NLP tools and techniques with minimum user
intervention. Therefore this method is a step forward in finding a
solution that requires little or no user intervention.
Abstract: This paper aims to analyze the role of natural
language processing (NLP). The paper will discuss the role in the
context of automated data retrieval, automated question answer, and
text structuring. NLP techniques are gaining wider acceptance in real
life applications and industrial concerns. There are various
complexities involved in processing the text of natural language that
could satisfy the need of decision makers. This paper begins with the
description of the qualities of NLP practices. The paper then focuses
on the challenges in natural language processing. The paper also
discusses major techniques of NLP. The last section describes
opportunities and challenges for future research.
Abstract: The paper follows a discourse on computer-assisted
language learning. We examine problems of foreign language
teaching and learning and introduce a metamodel that can be used to
define learning models of language grammar structures in order to
support teacher/student interaction. Special attention is paid to the
concept of a virtual language lab. Our approach to language
education assumes to encourage learners to experiment with a
language and to learn by discovering patterns of grammatically
correct structures created and managed by a language expert.
Abstract: Developing a reliable and sustainable software products is today a big challenge among up–coming software developers in Nigeria. The inability to develop a comprehensive problem statement needed to execute proper requirements engineering process is missing. The need to describe the ‘what’ of a system in one document, written in a natural language is a major step in the overall process of Software Engineering. Requirements Engineering is a process use to discover, analyze and validate system requirements. This process is needed in reducing software errors at the early stage of the development of software. The importance of each of the steps in Requirements Engineering is clearly explained in the context of using detailed problem statement from client/customer to get an overview of an existing system along with expectations from the new system. This paper elicits inadequate Requirements Engineering principle as the major cause of poor software development in developing nations using a case study of final year computer science students of a tertiary-education institution in Nigeria.
Abstract: Phenomenological analysis is not based on natural language, but ideal language which is able to be a carrier of ideal meanings – eidos representing typical structures or essences. For this purpose, it’s necessary to release from the spatio-temporal definiteness of a subject and then state its noetic essence (eidos) by means of free fantasy generation. Herewith, as if a totally new objectness is created - the universal, confirming the thesis that thinking process takes place in generalizations passing by numerous means through the specific to the general and from the general through the specific to the singular.
Abstract: This paper presents a visualized computer aided case tool for non-expert, called Visual Time, for representing and reasoning about incomplete and uncertain temporal information. It is both expressive and versatile, allowing logical conjunctions and disjunctions of both absolute and relative temporal relations, such as “Before”, “Meets”, “Overlaps”, “Starts”, “During”, and “Finishes”, etc. In terms of a visualized framework, Visual Time provides a user-friendly environment for describing scenarios with rich temporal structure in natural language, which can be formatted as structured temporal phrases and modeled in terms of Temporal Relationship Diagrams (TRD). A TRD can be automatically and visually transformed into a corresponding Time Graph, supported by automatic consistency checker that derives a verdict to confirm if a given scenario is temporally consistent or inconsistent.
Abstract: Internet is one of the major sources of information for
the person belonging to almost all the fields of life. Major language
that is used to publish information on internet is language. This thing
becomes a problem in a country like Pakistan, where Urdu is the
national language. Only 10% of Pakistan mass can understand
English. The reason is millions of people are deprived of precious
information available on internet. This paper presents a system for
translation from English to Urdu. A module LESSA is used that uses
a rule based algorithm to read the input text in English language,
understand it and translate it into Urdu language. The designed
approach was further incorporated to translate the complete website
from English language o Urdu language. An option appears in the
browser to translate the webpage in a new window. The designed
system will help the millions of users of internet to get benefit of the
internet and approach the latest information and knowledge posted
daily on internet.
Abstract: Natural language processing systems pose a unique
challenge for software architectural design as system complexity has
increased continually and systems cannot be easily constructed from
loosely coupled modules. Lexical, syntactic, semantic, and pragmatic
aspects of linguistic information are tightly coupled in a manner that
requires separation of concerns in a special way in design,
implementation and maintenance. An aspect oriented software
architecture is proposed in this paper after critically reviewing
relevant architectural issues. For the purpose of this paper, the
syntactic aspect is characterized by an augmented context-free
grammar. The semantic aspect is composed of multiple perspectives
including denotational, operational, axiomatic and case frame
approaches. Case frame semantics matured in India from deep
thematic analysis. It is argued that lexical, syntactic, semantic and
pragmatic aspects work together in a mutually dependent way and
their synergy is best represented in the aspect oriented approach. The
software architecture is presented with an augmented Unified
Modeling Language.
Abstract: Machine Translation (MT) between the Thai and English languages has been a challenging research topic in natural language processing. Most research has been done on English to Thai machine translation, but not the other way around. This paper presents a Thai to English Machine Translation System that translates a Thai sentence into interlingua of a Thai LFG tree using LFG grammar and a bottom up parser. The Thai LFG tree is then transformed into the corresponding English LFG tree by pattern matching and node transformation. Finally, an equivalent English sentence is created using structural information prescribed by the English LFG tree. Based on results of experiments designed to evaluate the performance of the proposed system, it can be stated that the system has been proven to be effective in providing a useful translation from Thai to English.
Abstract: There has been a growing interest in implementing humanoid avatars in networked virtual environment. However, most existing avatar communication systems do not take avatars- social backgrounds into consideration. This paper proposes a novel humanoid avatar animation system to represent personalities and facial emotions of avatars based on culture, profession, mood, age, taste, and so forth. We extract semantic keywords from the input text through natural language processing, and then the animations of personalized avatars are retrieved and displayed according to the order of the keywords. Our primary work is focused on giving avatars runtime instruction from multiple natural languages. Experiments with Chinese, Japanese and English input based on the prototype show that interactive avatar animations can be displayed in real time and be made available online. This system provides a more natural and interesting means of human communication, and therefore is expected to be used for cross-cultural communication, multiuser online games, and other entertainment applications.
Abstract: The evaluation of the question answering system is a major research area that needs much attention. Before the rise of domain-oriented question answering systems based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when question answering systems began to be more domains specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time achieve higher quality responses The research in this paper discusses the inappropriateness of the existing measure for response quality evaluation and in a later part, the call for new standard measures and the related considerations are brought forward. As a short-term solution for evaluating response quality of heterogeneous systems, and to demonstrate the challenges in evaluating systems of different nature, this research presents a black-box approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems (i.e. AnswerBus, START and NaLURI).
Abstract: This paper is concerned with the production of an Arabic word semantic similarity benchmark dataset. It is the first of its kind for Arabic which was particularly developed to assess the accuracy of word semantic similarity measurements. Semantic similarity is an essential component to numerous applications in fields such as natural language processing, artificial intelligence, linguistics, and psychology. Most of the reported work has been done for English. To the best of our knowledge, there is no word similarity measure developed specifically for Arabic. In this paper, an Arabic benchmark dataset of 70 word pairs is presented. New methods and best possible available techniques have been used in this study to produce the Arabic dataset. This includes selecting and creating materials, collecting human ratings from a representative sample of participants, and calculating the overall ratings. This dataset will make a substantial contribution to future work in the field of Arabic WSS and hopefully it will be considered as a reference basis from which to evaluate and compare different methodologies in the field.
Abstract: This paper presents a rule-based text- to- speech
(TTS) Synthesis System for Standard Malay, namely SMaTTS. The
proposed system using sinusoidal method and some pre- recorded
wave files in generating speech for the system. The use of phone
database significantly decreases the amount of computer memory
space used, thus making the system very light and embeddable. The
overall system was comprised of two phases the Natural Language
Processing (NLP) that consisted of the high-level processing of text
analysis, phonetic analysis, text normalization and morphophonemic
module. The module was designed specially for SM to overcome
few problems in defining the rules for SM orthography system before
it can be passed to the DSP module. The second phase is the Digital
Signal Processing (DSP) which operated on the low-level process of
the speech waveform generation. A developed an intelligible and
adequately natural sounding formant-based speech synthesis system
with a light and user-friendly Graphical User Interface (GUI) is
introduced. A Standard Malay Language (SM) phoneme set and an
inclusive set of phone database have been constructed carefully for
this phone-based speech synthesizer. By applying the generative
phonology, a comprehensive letter-to-sound (LTS) rules and a
pronunciation lexicon have been invented for SMaTTS. As for the
evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was
compiled and several experiments have been performed to evaluate
the quality of the synthesized speech by analyzing the Mean Opinion
Score (MOS) obtained. The overall performance of the system as
well as the room for improvements was thoroughly discussed.
Abstract: Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.
Abstract: The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more domain specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time to achieve high quality responses. This paper discusses the inappropriateness of the existing measures for response quality evaluation and the call for new standard measures and related considerations are brought forward. As a short-term solution for evaluating response quality of conversational agents, and to demonstrate the challenges in evaluating systems of different nature, this research proposes a blackbox approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems, AnswerBus, START and AINI.
Abstract: In this paper we describe the recognition process of Greek compound words using the PC-KIMMO software. We try to show certain limitations of the system with respect to the principles of compound formation in Greek. Moreover, we discuss the computational processing of phenomena such as stress and syllabification which are indispensable for the analysis of such constructions and we try to propose linguistically-acceptable solutions within the particular system.
Abstract: Extracting thematic (semantic) roles is one of the
major steps in representing text meaning. It refers to finding the
semantic relations between a predicate and syntactic constituents in a
sentence. In this paper we present a rule-based approach to extract
semantic roles from Persian sentences. The system exploits a twophase
architecture to (1) identify the arguments and (2) label them
for each predicate.
For the first phase we developed a rule based shallow parser to
chunk Persian sentences and for the second phase we developed a
knowledge-based system to assign 16 selected thematic roles to the
chunks. The experimental results of testing each phase are shown at
the end of the paper.
Abstract: Increasing growth of information volume in the
internet causes an increasing need to develop new (semi)automatic
methods for retrieval of documents and ranking them according to
their relevance to the user query. In this paper, after a brief review
on ranking models, a new ontology based approach for ranking
HTML documents is proposed and evaluated in various
circumstances. Our approach is a combination of conceptual,
statistical and linguistic methods. This combination reserves the
precision of ranking without loosing the speed. Our approach
exploits natural language processing techniques for extracting
phrases and stemming words. Then an ontology based conceptual
method will be used to annotate documents and expand the query.
To expand a query the spread activation algorithm is improved so
that the expansion can be done in various aspects. The annotated
documents and the expanded query will be processed to compute
the relevance degree exploiting statistical methods. The outstanding
features of our approach are (1) combining conceptual, statistical
and linguistic features of documents, (2) expanding the query with
its related concepts before comparing to documents, (3) extracting
and using both words and phrases to compute relevance degree, (4)
improving the spread activation algorithm to do the expansion based
on weighted combination of different conceptual relationships and
(5) allowing variable document vector dimensions. A ranking
system called ORank is developed to implement and test the
proposed model. The test results will be included at the end of the
paper.
Abstract: Documents retrieval in Information Retrieval
Systems (IRS) is generally about understanding of
information in the documents concern. The more the system
able to understand the contents of documents the more
effective will be the retrieval outcomes. But understanding of the
contents is a very complex task. Conventional IRS apply algorithms
that can only approximate the meaning of document contents through
keywords approach using vector space model. Keywords may be
unstemmed or stemmed. When keywords are stemmed and conflated
in retrieving process, we are a step forwards in applying semantic
technology in IRS. Word stemming is a process in morphological
analysis under natural language processing, before syntactic and
semantic analysis. We have developed algorithms for Malay and
Arabic and incorporated stemming in our experimental systems in
order to measure retrieval effectiveness. The results have shown that
the retrieval effectiveness has increased when stemming is used in
the systems.