Q-Map: Clinical Concept Mining from Clinical Documents

Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.

Teaching Turn-Taking Rules and Pragmatic Principles to Empower EFL Students and Enhance Their Learning in Speaking Modules

Teaching and learning EFL speaking modules is one of the most challenging productive modules for both instructors and learners. In a student-centered interactive communicative language teaching approach, learners and instructors should be aware of the fact that the target language must be taught as/for communication. The student must be empowered by tools that will work on more than one level of their communicative competence. Communicative learning will need a teaching and learning methodology that will address the goal. Teaching turn-taking rules, pragmatic principles and speech acts will enhance students' sociolinguistic competence, strategic competence together with discourse competence. Sociolinguistic competence entails the mastering of speech act conventions and illocutionary acts of refusing, agreeing/disagreeing; emotive acts like, thanking, apologizing, inviting, offering; directives like, ordering, requesting, advising, and hinting, among others. Strategic competence includes enlightening students’ consciousness of the various particular turn-taking systemic rules of organizing techniques of opening and closing conversation, adjacency pairs, interrupting, back-channeling, asking for/giving opinion, agreeing/disagreeing, using natural fillers for pauses, gaps, speaker select, self-select, and silence among others. Students will have the tools to manage a conversation. Students are engaged in opportunities of experiencing the natural language not as a mere extra student talking time but rather an empowerment of knowing and using the strategies. They will have the component items they need to use as well as the opportunity to communicate in the target language using topics of their interest and choice. This enhances students' communicative abilities. Available websites and textbooks now use one or more of these tools of turn-taking or pragmatics. These will be students' support in self-study in their independent learning study hours. This will be their reinforcement practice on e-Learning interactive activities. The students' target is to be able to communicate the intended meaning to an addressee that is in turn able to infer that intended meaning. The combination of these tools will be assertive and encouraging to the student to beat the struggle with what to say, how to say it, and when to say it. Teaching the rules, principles and techniques is an act of awareness raising method engaging students in activities that will lead to their pragmatic discourse competence. The aim of the paper is to show how the suggested pragmatic model will empower students with tools and systems that would support their learning. Supporting students with turn taking rules, speech act theory, applying both to texts and practical analysis and using it in speaking classes empowers students’ pragmatic discourse competence and assists them to understand language and its context. They become more spontaneous and ready to learn the discourse pragmatic dimension of the speaking techniques and suitable content. Students showed a better performance and a good motivation to learn. The model is therefore suggested for speaking modules in EFL classes.

Online Multilingual Dictionary Using Hamburg Notation for Avatar-Based Indian Sign Language Generation System

Sign Language (SL) is used by deaf and other people who cannot speak but can hear or have a problem with spoken languages due to some disability. It is a visual gesture language that makes use of either one hand or both hands, arms, face, body to convey meanings and thoughts. SL automation system is an effective way which provides an interface to communicate with normal people using a computer. In this paper, an avatar based dictionary has been proposed for text to Indian Sign Language (ISL) generation system. This research work will also depict a literature review on SL corpus available for various SL s over the years. For ISL generation system, a written form of SL is required and there are certain techniques available for writing the SL. The system uses Hamburg sign language Notation System (HamNoSys) and Signing Gesture Mark-up Language (SiGML) for ISL generation. It is developed in PHP using Web Graphics Library (WebGL) technology for 3D avatar animation. A multilingual ISL dictionary is developed using HamNoSys for both English and Hindi Language. This dictionary will be used as a database to associate signs with words or phrases of a spoken language. It provides an interface for admin panel to manage the dictionary, i.e., modification, addition, or deletion of a word. Through this interface, HamNoSys can be developed and stored in a database and these notations can be converted into its corresponding SiGML file manually. The system takes natural language input sentence in English and Hindi language and generate 3D sign animation using an avatar. SL generation systems have potential applications in many domains such as healthcare sector, media, educational institutes, commercial sectors, transportation services etc. This research work will help the researchers to understand various techniques used for writing SL and generation of Sign Language systems.

Adaption Model for Building Agile Pronunciation Dictionaries Using Phonemic Distance Measurements

Where human beings can easily learn and adopt pronunciation variations, machines need training before put into use. Also humans keep minimum vocabulary and their pronunciation variations are stored in front-end of their memory for ready reference, while machines keep the entire pronunciation dictionary for ready reference. Supervised methods are used for preparation of pronunciation dictionaries which take large amounts of manual effort, cost, time and are not suitable for real time use. This paper presents an unsupervised adaptation model for building agile and dynamic pronunciation dictionaries online. These methods mimic human approach in learning the new pronunciations in real time. A new algorithm for measuring sound distances called Dynamic Phone Warping is presented and tested. Performance of the system is measured using an adaptation model and the precision metrics is found to be better than 86 percent.

Social Media Idea Ontology: A Concept for Semantic Search of Product Ideas in Customer Knowledge through User-Centered Metrics and Natural Language Processing

In order to survive on the market, companies must constantly develop improved and new products. These products are designed to serve the needs of their customers in the best possible way. The creation of new products is also called innovation and is primarily driven by a company’s internal research and development department. However, a new approach has been taking place for some years now, involving external knowledge in the innovation process. This approach is called open innovation and identifies customer knowledge as the most important source in the innovation process. This paper presents a concept of using social media posts as an external source to support the open innovation approach in its initial phase, the Ideation phase. For this purpose, the social media posts are semantically structured with the help of an ontology and the authors are evaluated using graph-theoretical metrics such as density. For the structuring and evaluation of relevant social media posts, we also use the findings of Natural Language Processing, e. g. Named Entity Recognition, specific dictionaries, Triple Tagger and Part-of-Speech-Tagger. The selection and evaluation of the tools used are discussed in this paper. Using our ontology and metrics to structure social media posts enables users to semantically search these posts for new product ideas and thus gain an improved insight into the external sources such as customer needs.

Fuzzy Set Approach to Study Appositives and Its Impact Due to Positional Alterations

Computing with Words (CWW) and Possibilistic Relational Universal Fuzzy (PRUF) are the two concepts which widely represent and measure the vaguely defined natural phenomenon. In this paper, we study the positional alteration of the phrases by which the impact of a natural language proposition gets affected and/or modified. We observe the gradations due to sensitivity/feeling of a statement towards the positional alterations. We derive the classification and modification of the meaning of words due to the positional alteration. We present the results with reference to set theoretic interpretations.

Evaluating 8D Reports Using Text-Mining

Increasing quality requirements make reliable and effective quality management indispensable. This includes the complaint handling in which the 8D method is widely used. The 8D report as a written documentation of the 8D method is one of the key quality documents as it internally secures the quality standards and acts as a communication medium to the customer. In practice, however, the 8D report is mostly faulty and of poor quality. There is no quality control of 8D reports today. This paper describes the use of natural language processing for the automated evaluation of 8D reports. Based on semantic analysis and text-mining algorithms the presented system is able to uncover content and formal quality deficiencies and thus increases the quality of the complaint processing in the long term.

Causal Relation Identification Using Convolutional Neural Networks and Knowledge Based Features

Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.

Study of Syntactic Errors for Deep Parsing at Machine Translation

Syntactic parsing is vital for semantic treatment by many applications related to natural language processing (NLP), because form and content coincide in many cases. However, it has not yet reached the levels of reliable performance. By manually examining and analyzing individual machine translation output errors that involve syntax as well as semantics, this study attempts to discover what is required for improving syntactic and semantic parsing.

Definition of a Computing Independent Model and Rules for Transformation Focused on the Model-View-Controller Architecture

This paper presents a model-oriented development approach to software development in the Model-View-Controller (MVC) architectural standard. This approach aims to expose a process of extractions of information from the models, in which through rules and syntax defined in this work, assists in the design of the initial model and its future conversions. The proposed paper presents a syntax based on the natural language, according to the rules agreed in the classic grammar of the Portuguese language, added to the rules of conversions generating models that follow the norms of the Object Management Group (OMG) and the Meta-Object Facility MOF.

Designing and Evaluating Pedagogic Conversational Agents to Teach Children

In this paper, the possibility of children studying by using an interactive learning technology called Pedagogic Conversational Agent is presented. The main benefit is that the agent is able to adapt the dialogue to each student and to provide automatic feedback. Moreover, according to Math teachers, in many cases students are unable to solve the problems even knowing the procedure to solve them, because they do not understand what they have to do. The hypothesis is that if students are helped to understand what they have to solve, they will be able to do it. Taken that into account, we have started the development of Dr. Roland, an agent to help students understand Math problems following a User-Centered Design methodology. The use of this methodology is proposed, for the first time, to design pedagogic agents to teach any subject from Secondary down to Pre-Primary education. The reason behind proposing a methodology is that while working on this project, we noticed the lack of literature to design and evaluate agents. To cover this gap, we describe how User-Centered Design can be applied, and which usability techniques can be applied to evaluate the agent.

Automated User Story Driven Approach for Web-Based Functional Testing

Manual writing of test cases from functional requirements is a time-consuming task. Such test cases are not only difficult to write but are also challenging to maintain. Test cases can be drawn from the functional requirements that are expressed in natural language. However, manual test case generation is inefficient and subject to errors.  In this paper, we have presented a systematic procedure that could automatically derive test cases from user stories. The user stories are specified in a restricted natural language using a well-defined template.  We have also presented a detailed methodology for writing our test ready user stories. Our tool “Test-o-Matic” automatically generates the test cases by processing the restricted user stories. The generated test cases are executed by using open source Selenium IDE.  We evaluate our approach on a case study, which is an open source web based application. Effectiveness of our approach is evaluated by seeding faults in the open source case study using known mutation operators.  Results show that the test case generation from restricted user stories is a viable approach for automated testing of web applications.

Part of Speech Tagging Using Statistical Approach for Nepali Text

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

A Sentence-to-Sentence Relation Network for Recognizing Textual Entailment

Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.

Context Detection in Spreadsheets Based on Automatically Inferred Table Schema

Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.

A Hybrid Multi-Criteria Hotel Recommender System Using Explicit and Implicit Feedbacks

Recommender systems, also known as recommender engines, have become an important research area and are now being applied in various fields. In addition, the techniques behind the recommender systems have been improved over the time. In general, such systems help users to find their required products or services (e.g. books, music) through analyzing and aggregating other users’ activities and behavior, mainly in form of reviews, and making the best recommendations. The recommendations can facilitate user’s decision making process. Despite the wide literature on the topic, using multiple data sources of different types as the input has not been widely studied. Recommender systems can benefit from the high availability of digital data to collect the input data of different types which implicitly or explicitly help the system to improve its accuracy. Moreover, most of the existing research in this area is based on single rating measures in which a single rating is used to link users to items. This paper proposes a highly accurate hotel recommender system, implemented in various layers. Using multi-aspect rating system and benefitting from large-scale data of different types, the recommender system suggests hotels that are personalized and tailored for the given user. The system employs natural language processing and topic modelling techniques to assess the sentiment of the users’ reviews and extract implicit features. The entire recommender engine contains multiple sub-systems, namely users clustering, matrix factorization module, and hybrid recommender system. Each sub-system contributes to the final composite set of recommendations through covering a specific aspect of the problem. The accuracy of the proposed recommender system has been tested intensively where the results confirm the high performance of the system.

User Guidance for Effective Query Interpretation in Natural Language Interfaces to Ontologies

Natural Language Interfaces typically support a restricted language and also have scopes and limitations that naïve users are unaware of, resulting in errors when the users attempt to retrieve information from ontologies. To overcome this challenge, an auto-suggest feature is introduced into the querying process where users are guided through the querying process using interactive query construction system. Guiding users to formulate their queries, while providing them with an unconstrained (or almost unconstrained) way to query the ontology results in better interpretation of the query and ultimately lead to an effective search. The approach described in this paper is unobtrusive and subtly guides the users, so that they have a choice of either selecting from the suggestion list or typing in full. The user is not coerced into accepting system suggestions and can express himself using fragments or full sentences.

An Analysis of Learners’ Reports for Measuring Co-Creational Education

To increase the quality of learning, teacher and learner need mutual effort for realization of educational value. For this purpose, we need to manage the co-creational education among teacher and learners. In this research, we try to find a feature of co-creational education. To be more precise, we analyzed learners’ reports by natural language processing, and extract some features that describe the state of the co-creational education.

Formal Specification of Web Services Applications for Digital Reference Services of Library Information System

Digital reference service is when a traditional library reference service is provided electronically. In most cases users do not get full satisfaction from using digital reference service due to variety of reasons. This paper discusses the formal specification of web services applications for digital reference services (WSDRS). WSDRS is an informal model that claims to reduce the problems of digital reference services in libraries. It uses web services technology to provide efficient digital way of satisfying users’ need in the reference section of libraries. Informal model is in natural language which is inconsistent and ambiguous that may cause difficulties to the developers of the system. In order to solve this problem we decided to convert the informal specifications into formal specifications. This is supposed to reduce the overall development time and cost. We use Z language to develop the formal model and verify it with Z/EVES theorem prover tool.

Natural Language News Generation from Big Data

In this paper, we introduce an NLG application for the automatic creation of ready-to-publish texts from big data. The resulting fully automatic generated news stories have a high resemblance to the style in which the human writer would draw up such a story. Topics include soccer games, stock exchange market reports, and weather forecasts. Each generated text is unique. Readyto-publish stories written by a computer application can help humans to quickly grasp the outcomes of big data analyses, save timeconsuming pre-formulations for journalists and cater to rather small audiences by offering stories that would otherwise not exist.