Twitter Sentiment Analysis during the Lockdown on New Zealand

One of the most common fields of natural language processing (NLP) is sentimental analysis. The inferred feeling in the text can be successfully mined for various events using sentiment analysis. Twitter is viewed as a reliable data point for sentimental analytics studies since people are using social media to receive and exchange different types of data on a broad scale during the COVID-19 epidemic. The processing of such data may aid in making critical decisions on how to keep the situation under control. The aim of this research is to look at how sentimental states differed in a single geographic region during the lockdown at two different times.1162 tweets were analyzed related to the COVID-19 pandemic lockdown using keywords hashtags (lockdown, COVID-19) for the first sample tweets were from March 23, 2020, until April 23, 2020, and the second sample for the following year was from March 1, 2021, until April 4, 2021. Natural language processing (NLP), which is a form of Artificial intelligent was used for this research to calculate the sentiment value of all of the tweets by using AFINN Lexicon sentiment analysis method. The findings revealed that the sentimental condition in both different times during the region's lockdown was positive in the samples of this study, which are unique to the specific geographical area of New Zealand. This research suggests applied machine learning sentimental method such as Crystal Feel and extended the size of the sample tweet by using multiple tweets over a longer period of time.

Improving Subjective Bias Detection Using Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory

Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.

Personal Information Classification Based on Deep Learning in Automatic Form Filling System

Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.

DocPro: A Framework for Processing Semantic and Layout Information in Business Documents

With the recent advance of the deep neural network, we observe new applications of NLP (natural language processing) and CV (computer vision) powered by deep neural networks for processing business documents. However, creating a real-world document processing system needs to integrate several NLP and CV tasks, rather than treating them separately. There is a need to have a unified approach for processing documents containing textual and graphical elements with rich formats, diverse layout arrangement, and distinct semantics. In this paper, a framework that fulfills this unified approach is presented. The framework includes a representation model definition for holding the information generated by various tasks and specifications defining the coordination between these tasks. The framework is a blueprint for building a system that can process documents with rich formats, styles, and multiple types of elements. The flexible and lightweight design of the framework can help build a system for diverse business scenarios, such as contract monitoring and reviewing.

Tibyan Automated Arabic Correction Using Machine-Learning in Detecting Syntactical Mistakes

The Arabic language is one of the most important languages. Learning it is so important for many people around the world because of its religious and economic importance and the real challenge lies in practicing it without grammatical or syntactical mistakes. This research focused on detecting and correcting the syntactic mistakes of Arabic syntax according to their position in the sentence and focused on two of the main syntactical rules in Arabic: Dual and Plural. It analyzes each sentence in the text, using Stanford CoreNLP morphological analyzer and machine-learning approach in order to detect the syntactical mistakes and then correct it. A prototype of the proposed system was implemented and evaluated. It uses support vector machine (SVM) algorithm to detect Arabic grammatical errors and correct them using the rule-based approach. The prototype system has a far accuracy 81%. In general, it shows a set of useful grammatical suggestions that the user may forget about while writing due to lack of familiarity with grammar or as a result of the speed of writing such as alerting the user when using a plural term to indicate one person.

Q-Map: Clinical Concept Mining from Clinical Documents

Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.

Study of Syntactic Errors for Deep Parsing at Machine Translation

Syntactic parsing is vital for semantic treatment by many applications related to natural language processing (NLP), because form and content coincide in many cases. However, it has not yet reached the levels of reliable performance. By manually examining and analyzing individual machine translation output errors that involve syntax as well as semantics, this study attempts to discover what is required for improving syntactic and semantic parsing.

A Sentence-to-Sentence Relation Network for Recognizing Textual Entailment

Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.

Implementing a Database from a Requirement Specification

Creating a database scheme is essentially a manual process. From a requirement specification the information contained within has to be analyzed and reduced into a set of tables, attributes and relationships. This is a time consuming process that has to go through several stages before an acceptable database schema is achieved. The purpose of this paper is to implement a Natural Language Processing (NLP) based tool to produce a relational database from a requirement specification. The Stanford CoreNLP version 3.3.1 and the Java programming were used to implement the proposed model. The outcome of this study indicates that a first draft of a relational database schema can be extracted from a requirement specification by using NLP tools and techniques with minimum user intervention. Therefore this method is a step forward in finding a solution that requires little or no user intervention.

Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities

This paper aims to analyze the role of natural language processing (NLP). The paper will discuss the role in the context of automated data retrieval, automated question answer, and text structuring. NLP techniques are gaining wider acceptance in real life applications and industrial concerns. There are various complexities involved in processing the text of natural language that could satisfy the need of decision makers. This paper begins with the description of the qualities of NLP practices. The paper then focuses on the challenges in natural language processing. The paper also discusses major techniques of NLP. The last section describes opportunities and challenges for future research.

Query Reformulation Guided by External Resource for Information Retrieval

Reformulating the user query is a technique that aims to improve the performance of an Information Retrieval System (IRS) in terms of precision and recall. This paper tries to evaluate the technique of query reformulation guided by an external resource for Arabic texts. To do this, various precision and recall measures were conducted and two corpora with different external resources like Arabic WordNet (AWN) and the Arabic Dictionary (thesaurus) of Meaning (ADM) were used. Examination of the obtained results will allow us to measure the real contribution of this reformulation technique in improving the IRS performance.

Critical Assessment of Scoring Schemes for Protein-Protein Docking Predictions

Protein-protein interactions (PPI) play a crucial role in many biological processes such as cell signalling, transcription, translation, replication, signal transduction, and drug targeting, etc. Structural information about protein-protein interaction is essential for understanding the molecular mechanisms of these processes. Structures of protein-protein complexes are still difficult to obtain by biophysical methods such as NMR and X-ray crystallography, and therefore protein-protein docking computation is considered an important approach for understanding protein-protein interactions. However, reliable prediction of the protein-protein complexes is still under way. In the past decades, several grid-based docking algorithms based on the Katchalski-Katzir scoring scheme were developed, e.g., FTDock, ZDOCK, HADDOCK, RosettaDock, HEX, etc. However, the success rate of protein-protein docking prediction is still far from ideal. In this work, we first propose a more practical measure for evaluating the success of protein-protein docking predictions,the rate of first success (RFS), which is similar to the concept of mean first passage time (MFPT). Accordingly, we have assessed the ZDOCK bound and unbound benchmarks 2.0 and 3.0. We also createda new benchmark set for protein-protein docking predictions, in which the complexes have experimentally determined binding affinity data. We performed free energy calculation based on the solution of non-linear Poisson-Boltzmann equation (nlPBE) to improve the binding mode prediction. We used the well-studied thebarnase-barstarsystem to validate the parameters for free energy calculations. Besides,thenlPBE-based free energy calculations were conducted for the badly predicted cases by ZDOCK and ZRANK. We found that direct molecular mechanics energetics cannot be used to discriminate the native binding pose from the decoys.Our results indicate that nlPBE-based calculations appeared to be one of the promising approaches for improving the success rate of binding pose predictions.

The Economic Lot Scheduling Problem in Flow Lines with Sequence-Dependent Setups

The problem of lot sizing, sequencing and scheduling multiple products in flow line production systems has been studied by several authors. Almost all of the researches in this area assumed that setup times and costs are sequence –independent even though sequence dependent setups are common in practice. In this paper we present a new mixed integer non linear program (MINLP) and a heuristic method to solve the problem in sequence dependent case. Furthermore, a genetic algorithm has been developed which applies this constructive heuristic to generate initial population. These two proposed solution methods are compared on randomly generated problems. Computational results show a clear superiority of our proposed GA for majority of the test problems.

On Optimum Stratification

In this manuscript, we discuss the problem of determining the optimum stratification of a study (or main) variable based on the auxiliary variable that follows a uniform distribution. If the stratification of survey variable is made using the auxiliary variable it may lead to substantial gains in precision of the estimates. This problem is formulated as a Nonlinear Programming Problem (NLPP), which turn out to multistage decision problem and is solved using dynamic programming technique.

A Novel Optimal Setting for Directional over Current Relay Coordination using Particle Swarm Optimization

Over Current Relays (OCRs) and Directional Over Current Relays (DOCRs) are widely used for the radial protection and ring sub transmission protection systems and for distribution systems. All previous work formulates the DOCR coordination problem either as a Non-Linear Programming (NLP) for TDS and Ip or as a Linear Programming (LP) for TDS using recently a social behavior (Particle Swarm Optimization techniques) introduced to the work. In this paper, a Modified Particle Swarm Optimization (MPSO) technique is discussed for the optimal settings of DOCRs in power systems as a Non-Linear Programming problem for finding Ip values of the relays and for finding the TDS setting as a linear programming problem. The calculation of the Time Dial Setting (TDS) and the pickup current (Ip) setting of the relays is the core of the coordination study. PSO technique is considered as realistic and powerful solution schemes to obtain the global or quasi global optimum in optimization problem.

SMaTTS: Standard Malay Text to Speech System

This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.

A Joint Routing-Scheduling Approach for Throughput Optimization in WMNs

Wireless Mesh Networking is a promising proposal for broadband data transmission in a large area with low cost and acceptable QoS. These features- trade offs in WMNs is a hot research field nowadays. In this paper a mathematical optimization framework has been developed to maximize throughput according to upper bound delay constraints. IEEE 802.11 based infrastructure backhauling mode of WMNs has been considered to formulate the MINLP optimization problem. Proposed method gives the full routing and scheduling procedure in WMN in order to obtain mentioned goals.

Ontology Population via NLP Techniques in Risk Management

In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA1 project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency2. The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.

Cross Signal Identification for PSG Applications

The standard investigational method for obstructive sleep apnea syndrome (OSAS) diagnosis is polysomnography (PSG), which consists of a simultaneous, usually overnight recording of multiple electro-physiological signals related to sleep and wakefulness. This is an expensive, encumbering and not a readily repeated protocol, and therefore there is need for simpler and easily implemented screening and detection techniques. Identification of apnea/hypopnea events in the screening recordings is the key factor for the diagnosis of OSAS. The analysis of a solely single-lead electrocardiographic (ECG) signal for OSAS diagnosis, which may be done with portable devices, at patient-s home, is the challenge of the last years. A novel artificial neural network (ANN) based approach for feature extraction and automatic identification of respiratory events in ECG signals is presented in this paper. A nonlinear principal component analysis (NLPCA) method was considered for feature extraction and support vector machine for classification/recognition. An alternative representation of the respiratory events by means of Kohonen type neural network is discussed. Our prospective study was based on OSAS patients of the Clinical Hospital of Pneumology from Iaşi, Romania, males and females, as well as on non-OSAS investigated human subjects. Our computed analysis includes a learning phase based on cross signal PSG annotation.

WebGD: A CORBA-based Document Classification and Retrieval System on the Web

This paper presents the design and implementation of the WebGD, a CORBA-based document classification and retrieval system on Internet. The WebGD makes use of such techniques as Web, CORBA, Java, NLP, fuzzy technique, knowledge-based processing and database technology. Unified classification and retrieval model, classifying and retrieving with one reasoning engine and flexible working mode configuration are some of its main features. The architecture of WebGD, the unified classification and retrieval model, the components of the WebGD server and the fuzzy inference engine are discussed in this paper in detail.