Abstract: The morphology of most Persian words goes back to the Indo-European and Indo-Iranian periods. These words show the beliefs and views of the earliest people about their structure. It is also necessary to search for the vocabulary in the Indo-European and Indo-Iranian periods. During recent centuries, comparative linguistics and mythology have facilitated the common Indo-European lexicon to reconstruct. The Persians have been appeared in the Assyrian inscriptions and affected by the Mesopotamians. It is also worth paying attention to the cultural and linguistic exchanges with the Mesopotamian civilizations. This paper aims to show the morphology of Pārsa based on linguistic evolutions and historical-mythological traditions. The method of this study is also to reconstruct both morphology and the earliest form of Persia. Then, it is tried to find the most plausive meaning according to the historical-mythological traditions. In the end, the sickle or scythe is considered the most probable meaning for Pārsa.
Abstract: Facebook, Twitter, Weibo, and other social media and significant e-commerce sites generate a massive amount of online texts, which can be used to analyse people’s opinions or sentiments for better decision-making. So, sentiment analysis, especially the fine-grained sentiment analysis, is a very active research topic. In this paper, we survey various methods for fine-grained sentiment analysis, including traditional sentiment lexicon-based methods, ma-chine learning-based methods, and deep learning-based methods in aspect/target/attribute-based sentiment analysis tasks. Besides, we discuss their advantages and problems worthy of careful studies in the future.
Abstract: One of the most common fields of natural language processing (NLP) is sentimental analysis. The inferred feeling in the text can be successfully mined for various events using sentiment analysis. Twitter is viewed as a reliable data point for sentimental analytics studies since people are using social media to receive and exchange different types of data on a broad scale during the COVID-19 epidemic. The processing of such data may aid in making critical decisions on how to keep the situation under control. The aim of this research is to look at how sentimental states differed in a single geographic region during the lockdown at two different times.1162 tweets were analyzed related to the COVID-19 pandemic lockdown using keywords hashtags (lockdown, COVID-19) for the first sample tweets were from March 23, 2020, until April 23, 2020, and the second sample for the following year was from March 1, 2021, until April 4, 2021. Natural language processing (NLP), which is a form of Artificial intelligent was used for this research to calculate the sentiment value of all of the tweets by using AFINN Lexicon sentiment analysis method. The findings revealed that the sentimental condition in both different times during the region's lockdown was positive in the samples of this study, which are unique to the specific geographical area of New Zealand. This research suggests applied machine learning sentimental method such as Crystal Feel and extended the size of the sample tweet by using multiple tweets over a longer period of time.
Abstract: During COVID-19, the depression rate has increased dramatically. Young adults are most vulnerable to the mental health effects of the pandemic. Lower-income families have a higher ratio to be diagnosed with depression than the general population, but less access to clinics. This research aims to achieve early depression detection at low cost, large scale, and high accuracy with an interdisciplinary approach by incorporating clinical practices defined by American Psychiatric Association (APA) as well as multimodal AI framework. The proposed approach detected the nine depression symptoms with Natural Language Processing sentiment analysis and a symptom-based Lexicon uniquely designed for young adults. The experiments were conducted on the multimedia survey results from adolescents and young adults and unbiased Twitter communications. The result was further aggregated with the facial emotional cues analyzed by the Convolutional Neural Network on the multimedia survey videos. Five experiments each conducted on 10k data entries reached consistent results with an average accuracy of 88.31%, higher than the existing natural language analysis models. This approach can reach 300+ million daily active Twitter users and is highly accessible by low-income populations to promote early depression detection to raise awareness in adolescents and young adults and reveal complementary cues to assist clinical depression diagnosis.
Abstract: In a time period populated by legacy newspaper readers who throw around the term “fake news” as though it has long been a part of the lexicon, journalism schools must convince would-be students that their degree is still viable and that they are not teaching a curriculum of deception. As such, journalism schools’ academic administrators tasked with creating and maintaining conversant curricula must stay ahead of legacy newspaper industry trends – both in the print and online products – and ensure that what is being taught in the classroom is both fresh and appropriate to the demands of the evolving legacy newspaper industry. This study examines the information obtained from the result of interviews of journalism academic administrators in order to identify institutional pedagogy for recent journalism school graduates interested in pursuing careers at legacy newspapers. This research also explores the existing relationship between journalism school academic administrators and legacy newspaper editors. The results indicate the value administrators put on various academy teachings, and they also highlight a perceived disconnect between journalism academic administrators and legacy newspaper hiring editors.
Abstract: Sentiment analysis is a broad and expanding field that aims to extract and classify opinions from textual data. Lexicon-based approaches are based on the use of a sentiment lexicon, i.e., a list of words each mapped to a sentiment score, to rate the sentiment of a text chunk. Our work focuses on predicting stock price change using a sentiment lexicon built from financial conference call logs. We present a method to generate a sentiment lexicon based upon an existing probabilistic approach. By using a domain-specific lexicon, we outperform traditional techniques and demonstrate that domain-specific sentiment lexicons provide higher accuracy than generic sentiment lexicons when predicting stock price change.
Abstract: Mass media seem to provide a rich content for language acquisition. Exposure to television, the Internet, the mobile phone and other technological gadgets and devices helps enrich the student’s lexicon positively as well as negatively. The difficulties encountered by students while learning and acquiring second languages in addition to their eagerness to comprehend the content of a particular program prompt them to diversify their methods so as to achieve their targets. The present study highlights the significance of certain media channels and their involvement in language acquisition with the employment of the Natural Approach to further grasp whether students, especially secondary and high school students, learn and acquire errors through watching subtitled television programs. The chief objective is investigating the deductive and inductive relevance of certain programs beside the involvement of peripheral learning while acquiring mistakes.
Abstract: Sentiment analysis is a broad and expanding field that aims to extract and classify opinions from textual data. Lexicon-based approaches are based on the use of a sentiment lexicon, i.e., a list of words each mapped to a sentiment score, to rate the sentiment of a text chunk. Our work focuses on predicting stock price change using a sentiment lexicon built from financial conference call logs. We introduce a method to generate a sentiment lexicon based upon an existing probabilistic approach. By using a domain-specific lexicon, we outperform traditional techniques and demonstrate that domain-specific sentiment lexicons provide higher accuracy than generic sentiment lexicons when predicting stock price change.
Abstract: Objectives: The purpose of this study was to understand and examine the association between diurnal mood variation and sexual/health-related status among men who have sex with men (MSM) using data from MSM Chinese Twitter messages. The study consists of 843,745 postings of 377,610 MSM users located in Guangdong that were culled from the MSM Chinese Twitter App. Positive affect, negative affect, sexual related behaviors, and health-related status were measured using the Simplified Chinese Linguistic Inquiry and Word Count. Emotions, including joy, sadness, anger, fear, and disgust were measured using the Weibo Basic Mood Lexicon. A positive sentiment score and a positive emotions score were also calculated. Linear regression models based on a permutation test were used to assess associations between affective states and sexual/health-related status. In the results, 5,871 active MSM users and their 477,374 postings were finally selected. MSM expressed positive affect and joy at 8 a.m. and expressed negative affect and negative emotions between 2 a.m. and 4 a.m. In addition, 25.1% of negative postings were directly related to health and 13.4% reported seeking social support during that sensitive period. MSM who were senior, educated, overweight or obese, self-identified as performing a versatile sex role, and with less followers, more followers, and less chat groups mainly expressed more negative affect and negative emotions. MSM who talked more about sexual-related behaviors had a higher positive sentiment score (β=0.29, p < 0.001) and a higher positive emotions score (β = 0.16, p < 0.001). MSM who reported more on their health status had a lower positive sentiment score (β = -0.83, p < 0.001) and a lower positive emotions score (β = -0.37, p < 0.001). The study concluded that psychological intervention based on an app for MSM should be conducted, as it may improve mental health.
Abstract: Many Asian languages, such as Korean and Japanese, are well-known for their wide use of sound symbolic words or ideophones. This is a very particular characteristic which enriches its lexicon hugely. Ideophones are a class of sound symbolic words that utilize sound symbolism to express aspects, states, emotions, or conditions that can be experienced through the senses, such as shape, color, smell, action or movement. Ideophones have very particular characteristics in terms of sound symbolism and morphology, which distinguish them from other words. The phonological characteristics of ideophones are vowel ablaut or vowel gradation and consonant mutation. In the case of Korean, there are light vowels and dark vowels. Depending on the type of vowel that is used, the meaning will slightly change. Consonant mutation, also known as consonant ablaut, contributes to the level of intensity, emphasis, and volume of an expression. In addition to these phonological characteristics, there is one main morphological singularity, which is reduplication and it carries the meaning of continuity, repetition, intensity, emphasis, and plurality. All these characteristics play an important role in both linguistics and literature as they enhance the meaning of what is trying to be expressed with incredible semantic detail, expressiveness, and rhythm. The following study will analyze the ideophones used in a single paragraph of a Korean novel, which add incredible yet subtle detail to the meaning of the words, and advance the expressiveness and rhythm of the text. The results from analyzing one paragraph from a novel, after presenting the phonological and morphological characteristics of Korean ideophones, will evidence the important role that ideophones play in literature.
Abstract: The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.
Abstract: Humanity faces more and more often with different social disasters, which in turn can generate new accidents and catastrophes. To mitigate their consequences, it is important to obtain early possible signals about the events which are or can occur and to prepare the corresponding scenarios that could be applied. Our research is focused on solving two problems in this domain: identifying signals related that an accident occurred or may occur and mitigation of some consequences of disasters. To solve the first problem, methods of selecting and processing texts from global network Internet are developed. Information in Romanian is of special interest for us. In order to obtain the mentioned tools, we should follow several steps, divided into preparatory stage and processing stage. Throughout the first stage, we manually collected over 724 news articles and classified them into 10 categories of social disasters. It constitutes more than 150 thousand words. Using this information, a controlled vocabulary of more than 300 keywords was elaborated, that will help in the process of classification and identification of the texts related to the field of social disasters. To solve the second problem, the formalism of Petri net has been used. We deal with the problem of inhabitants’ evacuation in useful time. The analysis methods such as reachability or coverability tree and invariants technique to determine dynamic properties of the modeled systems will be used. To perform a case study of properties of extended evacuation system by adding time, the analysis modules of PIPE such as Generalized Stochastic Petri Nets (GSPN) Analysis, Simulation, State Space Analysis, and Invariant Analysis have been used. These modules helped us to obtain the average number of persons situated in the rooms and the other quantitative properties and characteristics related to its dynamics.
Abstract: Emotions classification of text documents is applied to reveal if the document expresses a determined emotion from its writer. As different supervised methods are previously used for emotion documents’ classification, in this research we present a novel model that supports the classification algorithms for more accurate results by the support of TF-IDF measure. Different experiments have been applied to reveal the applicability of the proposed model, the model succeeds in raising the accuracy percentage according to the determined metrics (precision, recall, and f-measure) based on applying the refinement of the lexicon, integration of lexicons using different perspectives, and applying the TF-IDF weighting measure over the classifying features. The proposed model has also been compared with other research to prove its competence in raising the results’ accuracy.
Abstract: Document Analysis is an important research field that aims to gather the information by analyzing the data in documents. As one of the important targets for many fields is to understand what people actually want, sentimental analysis field has been one of the vital fields that are tightly related to the document analysis. This research focuses on analyzing text documents to classify each document according to its opinion. The aim of this research is to detect the emotions from text documents based on enriching the lexicon with adapting their content based on semantic patterns extraction. The proposed approach has been presented, and different experiments are applied by different perspectives to reveal the positive impact of the proposed approach on the classification results.
Abstract: Sentiment analysis (SA) has received growing
attention in Arabic language research. However, few studies have yet
to directly apply SA to Arabic due to lack of a publicly available
dataset for this language. This paper partially bridges this gap due to
its focus on one of the Arabic dialects which is the Saudi dialect. This
paper presents annotated data set of 4700 for Saudi dialect sentiment
analysis with (K= 0.807). Our next work is to extend this corpus and
creation a large-scale lexicon for Saudi dialect from the corpus.
Abstract: This research study aims to present a retrospective
study about speech recognition systems and artificial intelligence.
Speech recognition has become one of the widely used technologies,
as it offers great opportunity to interact and communicate with
automated machines. Precisely, it can be affirmed that speech
recognition facilitates its users and helps them to perform their daily
routine tasks, in a more convenient and effective manner. This
research intends to present the illustration of recent technological
advancements, which are associated with artificial intelligence.
Recent researches have revealed the fact that speech recognition is
found to be the utmost issue, which affects the decoding of speech. In
order to overcome these issues, different statistical models were
developed by the researchers. Some of the most prominent statistical
models include acoustic model (AM), language model (LM), lexicon
model, and hidden Markov models (HMM). The research will help in
understanding all of these statistical models of speech recognition.
Researchers have also formulated different decoding methods, which
are being utilized for realistic decoding tasks and constrained
artificial languages. These decoding methods include pattern
recognition, acoustic phonetic, and artificial intelligence. It has been
recognized that artificial intelligence is the most efficient and reliable
methods, which are being used in speech recognition.
Abstract: Due to the rapid increase of Internet, web opinion
sources dynamically emerge which is useful for both potential
customers and product manufacturers for prediction and decision
purposes. These are the user generated contents written in natural
languages and are unstructured-free-texts scheme. Therefore, opinion
mining techniques become popular to automatically process customer
reviews for extracting product features and user opinions expressed
over them. Since customer reviews may contain both opinionated and
factual sentences, a supervised machine learning technique applies
for subjectivity classification to improve the mining performance. In
this paper, we dedicate our work is the task of opinion
summarization. Therefore, product feature and opinion extraction is
critical to opinion summarization, because its effectiveness
significantly affects the identification of semantic relationships. The
polarity and numeric score of all the features are determined by
Senti-WordNet Lexicon. The problem of opinion summarization
refers how to relate the opinion words with respect to a certain
feature. Probabilistic based model of supervised learning will
improve the result that is more flexible and effective.
Abstract: The need to extract R&D keywords from issues and use
them to retrieve R&D information is increasing rapidly. However, it is
difficult to identify related issues or distinguish them. Although the
similarity between issues cannot be identified, with an R&D lexicon,
issues that always share the same R&D keywords can be determined.
In detail, the R&D keywords that are associated with a particular issue
imply the key technology elements that are needed to solve a particular
issue.
Furthermore, the relationship among issues that share the same
R&D keywords can be shown in a more systematic way by clustering
them according to keywords. Thus, sharing R&D results and reusing
R&D technology can be facilitated. Indirectly, redundant investment
in R&D can be reduced as the relevant R&D information can be shared
among corresponding issues and the reusability of related R&D can be
improved. Therefore, a methodology to cluster issues from the
perspective of common R&D keywords is proposed to satisfy these
demands.
Abstract: Email has become a fast and cheap means of online
communication. The main threat to email is Unsolicited Bulk Email
(UBE), commonly called spam email. The current work aims at
identification of unigrams in more than 2700 UBE that advertise
body-enhancement drugs. The identification is based on the
requirement that the unigram is neither present in dictionary, nor is a
slang term. The motives of the paper are many fold. This is an
attempt to analyze spamming behaviour and employment of wordmutation
technique. On the side-lines of the paper, we have
attempted to better understand the spam, the slang and their interplay.
The problem has been addressed by employing Tokenization
technique and Unigram BOW model. We found that the non-lexicon
words constitute nearly 66% of total number of lexis of corpus
whereas non-slang words constitute nearly 2.4% of non-lexicon
words. Further, non-lexicon non-slang unigrams composed of 2
lexicon words, form more than 71% of the total number of such
unigrams. To the best of our knowledge, this is the first attempt to
analyze usage of non-lexicon non-slang unigrams in any kind of
UBE.
Abstract: This paper presents a rule-based text- to- speech
(TTS) Synthesis System for Standard Malay, namely SMaTTS. The
proposed system using sinusoidal method and some pre- recorded
wave files in generating speech for the system. The use of phone
database significantly decreases the amount of computer memory
space used, thus making the system very light and embeddable. The
overall system was comprised of two phases the Natural Language
Processing (NLP) that consisted of the high-level processing of text
analysis, phonetic analysis, text normalization and morphophonemic
module. The module was designed specially for SM to overcome
few problems in defining the rules for SM orthography system before
it can be passed to the DSP module. The second phase is the Digital
Signal Processing (DSP) which operated on the low-level process of
the speech waveform generation. A developed an intelligible and
adequately natural sounding formant-based speech synthesis system
with a light and user-friendly Graphical User Interface (GUI) is
introduced. A Standard Malay Language (SM) phoneme set and an
inclusive set of phone database have been constructed carefully for
this phone-based speech synthesizer. By applying the generative
phonology, a comprehensive letter-to-sound (LTS) rules and a
pronunciation lexicon have been invented for SMaTTS. As for the
evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was
compiled and several experiments have been performed to evaluate
the quality of the synthesized speech by analyzing the Mean Opinion
Score (MOS) obtained. The overall performance of the system as
well as the room for improvements was thoroughly discussed.