Abstract: Emotions classification of text documents is applied to reveal if the document expresses a determined emotion from its writer. As different supervised methods are previously used for emotion documents’ classification, in this research we present a novel model that supports the classification algorithms for more accurate results by the support of TF-IDF measure. Different experiments have been applied to reveal the applicability of the proposed model, the model succeeds in raising the accuracy percentage according to the determined metrics (precision, recall, and f-measure) based on applying the refinement of the lexicon, integration of lexicons using different perspectives, and applying the TF-IDF weighting measure over the classifying features. The proposed model has also been compared with other research to prove its competence in raising the results’ accuracy.
Abstract: Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.
Abstract: Data fusion technology can be the best way to extract
useful information from multiple sources of data. It has been widely
applied in various applications. This paper presents a data fusion
approach in multimedia data for event detection in twitter by using
Dempster-Shafer evidence theory. The methodology applies a mining
algorithm to detect the event. There are two types of data in the
fusion. The first is features extracted from text by using the bag-ofwords
method which is calculated using the term frequency-inverse
document frequency (TF-IDF). The second is the visual features
extracted by applying scale-invariant feature transform (SIFT). The
Dempster - Shafer theory of evidence is applied in order to fuse the
information from these two sources. Our experiments have indicated
that comparing to the approaches using individual data source, the
proposed data fusion approach can increase the prediction accuracy
for event detection. The experimental result showed that the proposed
method achieved a high accuracy of 0.97, comparing with 0.93 with
texts only, and 0.86 with images only.
Abstract: The web’s increased popularity has included a huge
amount of information, due to which automated web page
classification systems are essential to improve search engines’
performance. Web pages have many features like HTML or XML
tags, hyperlinks, URLs and text contents which can be considered
during an automated classification process. It is known that Webpage
classification is enhanced by hyperlinks as it reflects Web page
linkages. The aim of this study is to reduce the number of features to
be used to improve the accuracy of the classification of web pages. In
this paper, a novel feature selection method using an improved
Particle Swarm Optimization (PSO) using principle of evolution is
proposed. The extracted features were tested on the WebKB dataset
using a parallel Neural Network to reduce the computational cost.
Abstract: This paper presents a system for discovering
association rules from collections of unstructured documents called
EART (Extract Association Rules from Text). The EART system
treats texts only not images or figures. EART discovers association
rules amongst keywords labeling the collection of textual documents.
The main characteristic of EART is that the system integrates XML
technology (to transform unstructured documents into structured
documents) with Information Retrieval scheme (TF-IDF) and Data
Mining technique for association rules extraction. EART depends on
word feature to extract association rules. It consists of four phases:
structure phase, index phase, text mining phase and visualization
phase. Our work depends on the analysis of the keywords in the
extracted association rules through the co-occurrence of the keywords
in one sentence in the original text and the existing of the keywords
in one sentence without co-occurrence. Experiments applied on a
collection of scientific documents selected from MEDLINE that are
related to the outbreak of H5N1 avian influenza virus.