Abstract: Sentiment analysis and opinion mining have become
emerging topics of research in recent years but most of the work
is focused on data in the English language. A comprehensive
research and analysis are essential which considers multiple
languages, machine translation techniques, and different classifiers.
This paper presents, a comparative analysis of different approaches
for multilingual sentiment analysis. These approaches are divided
into two parts: one using classification of text without language
translation and second using the translation of testing data to a
target language, such as English, before classification. The presented
research and results are useful for understanding whether machine
translation should be used for multilingual sentiment analysis or
building language specific sentiment classification systems is a better
approach. The effects of language translation techniques, features,
and accuracy of various classifiers for multilingual sentiment analysis
is also discussed in this study.
Abstract: Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.
Abstract: Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.
Abstract: Sentiment analysis (SA) has received growing
attention in Arabic language research. However, few studies have yet
to directly apply SA to Arabic due to lack of a publicly available
dataset for this language. This paper partially bridges this gap due to
its focus on one of the Arabic dialects which is the Saudi dialect. This
paper presents annotated data set of 4700 for Saudi dialect sentiment
analysis with (K= 0.807). Our next work is to extend this corpus and
creation a large-scale lexicon for Saudi dialect from the corpus.
Abstract: Sentiment analysis means to classify a given review
document into positive or negative polar document. Sentiment
analysis research has been increased tremendously in recent times
due to its large number of applications in the industry and academia.
Sentiment analysis models can be used to determine the opinion of
the user towards any entity or product. E-commerce companies can
use sentiment analysis model to improve their products on the basis
of users’ opinion. In this paper, we propose a new One-class Support
Vector Machine (One-class SVM) based sentiment analysis model
for movie review documents. In the proposed approach, we initially
extract features from one class of documents, and further test the
given documents with the one-class SVM model if a given new test
document lies in the model or it is an outlier. Experimental results
show the effectiveness of the proposed sentiment analysis model.
Abstract: Traditional document representation for classification
follows Bag of Words (BoW) approach to represent the term weights.
The conventional method uses the Vector Space Model (VSM) to
exploit the statistical information of terms in the documents and they
fail to address the semantic information as well as order of the terms
present in the documents. Although, the phrase based approach
follows the order of the terms present in the documents rather than
semantics behind the word. Therefore, a semantic concept based
approach is used in this paper for enhancing the semantics by
incorporating the ontology information. In this paper a novel method
is proposed to forecast the intraday stock market price directional
movement based on the sentiments from Twitter and money control
news articles. The stock market forecasting is a very difficult and
highly complicated task because it is affected by many factors such
as economic conditions, political events and investor’s sentiment etc.
The stock market series are generally dynamic, nonparametric, noisy
and chaotic by nature. The sentiment analysis along with wisdom of
crowds can automatically compute the collective intelligence of
future performance in many areas like stock market, box office sales
and election outcomes. The proposed method utilizes collective
sentiments for stock market to predict the stock price directional
movements. The collective sentiments in the above social media have
powerful prediction on the stock price directional movements as
up/down by using Granger Causality test.
Abstract: Large-scale data stream analysis has become one of
the important business and research priorities lately. Social networks
like Twitter and other micro-blogging platforms hold an enormous
amount of data that is large in volume, velocity and variety.
Extracting valuable information and trends out of these data would
aid in a better understanding and decision-making. Multiple analysis
techniques are deployed for English content. Moreover, one of the
languages that produce a large amount of data over social networks
and is least analyzed is the Arabic language. The proposed paper is a
survey on the research efforts to analyze the Arabic content in
Twitter focusing on the tools and methods used to extract the
sentiments for the Arabic content on Twitter.
Abstract: The dramatic rise in the use of Social Media (SM)
platforms such as Facebook and Twitter provide access to an
unprecedented amount of user data. Users may post reviews on
products and services they bought, write about their interests, share
ideas or give their opinions and views on political issues. There is a
growing interest in the analysis of SM data from organisations for
detecting new trends, obtaining user opinions on their products and
services or finding out about their online reputations. A recent
research trend in SM analysis is making predictions based on
sentiment analysis of SM. Often indicators of historic SM data are
represented as time series and correlated with a variety of real world
phenomena like the outcome of elections, the development of
financial indicators, box office revenue and disease outbreaks. This
paper examines the current state of research in the area of SM mining
and predictive analysis and gives an overview of the analysis
methods using opinion mining and machine learning techniques.
Abstract: This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a system of analysis of congruence between the notes. This ensured satisfactory results even without using specialized annotators in the context of the research, avoiding the generation of biased training data for the classifiers. For this, a case
study was conducted in a blog of entrepreneurship. The experimental results were consistent with the literature related annotation using formalized process with experts.
Abstract: Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.
Abstract: This article deals with the popularity of candidates for the president of the United States of America. The popularity is assessed according to public comments on the Web 2.0. Social networking, blogging and online forums (collectively Web 2.0) are for common Internet users the easiest way to share their personal opinions, thoughts, and ideas with the entire world. However, the web content diversity, variety of technologies and website structure differences, all of these make the Web 2.0 a network of heterogeneous data, where things are difficult to find for common users. The introductory part of the article describes methodology for gathering and processing data from Web 2.0. The next part of the article is focused on the evaluation and content analysis of obtained information, which write about presidential candidates.
Abstract: Web 2.0 (social networking, blogging and online
forums) can serve as a data source for social science research because
it contains vast amount of information from many different users.
The volume of that information has been growing at a very high rate
and becoming a network of heterogeneous data; this makes things
difficult to find and is therefore not almost useful. We have proposed
a novel theoretical model for gathering and processing data from
Web 2.0, which would reflect semantic content of web pages in
better way. This article deals with the analysis part of the model and
its usage for content analysis of blogs. The introductory part of the
article describes methodology for the gathering and processing data
from blogs. The next part of the article is focused on the evaluation
and content analysis of blogs, which write about specific trend.