Abstract: Recently, numerous documents including large
volumes of unstructured data and text have been created because of the
rapid increase in the use of social media and the Internet. Usually,
these documents are categorized for the convenience of users. Because
the accuracy of manual categorization is not guaranteed, and such
categorization requires a large amount of time and incurs huge costs.
Many studies on automatic categorization have been conducted to help
mitigate the limitations of manual categorization. Unfortunately, most
of these methods cannot be applied to categorize complex documents
with multiple topics because they work on the assumption that
individual documents can be categorized into single categories only.
Therefore, to overcome this limitation, some studies have attempted to
categorize each document into multiple categories. However, the
learning process employed in these studies involves training using a
multi-categorized document set. These methods therefore cannot be
applied to the multi-categorization of most documents unless
multi-categorized training sets using traditional multi-categorization
algorithms are provided. To overcome this limitation, in this study, we
review our novel methodology for extending the category of a
single-categorized document to multiple categorizes, and then
introduce a survey-based verification scenario for estimating the
accuracy of our automatic categorization methodology.
Abstract: Despite the highly touted benefits, emerging
technologies have unleashed pervasive concerns regarding unintended
and unforeseen social impacts. Thus, those wishing to create safe and
socially acceptable products need to identify such side effects and
mitigate them prior to the market proliferation. Various methodologies
in the field of technology assessment (TA), namely Delphi, impact
assessment, and scenario planning, have been widely incorporated in
such a circumstance. However, literatures face a major limitation in
terms of sole reliance on participatory workshop activities. They
unfortunately missed out the availability of a massive untapped data
source of futuristic information flooding through the Internet. This
research thus seeks to gain insights into utilization of futuristic data,
future-oriented documents from the Internet, as a supplementary
method to generate social impact scenarios whilst capturing
perspectives of experts from a wide variety of disciplines. To this end,
network analysis is conducted based on the social keywords extracted
from the futuristic documents by text mining, which is then used as a
guide to produce a comprehensive set of detailed scenarios. Our
proposed approach facilitates harmonized depictions of possible
hazardous consequences of emerging technologies and thereby makes
decision makers more aware of, and responsive to, broad qualitative
uncertainties.
Abstract: Term Extraction, a key data preparation step in Text
Mining, extracts the terms, i.e. relevant collocation of words,
attached to specific concepts (e.g. genetic-algorithms and decisiontrees
are terms associated to the concept “Machine Learning" ). In
this paper, the task of extracting interesting collocations is achieved
through a supervised learning algorithm, exploiting a few
collocations manually labelled as interesting/not interesting. From
these examples, the ROGER algorithm learns a numerical function,
inducing some ranking on the collocations. This ranking is optimized
using genetic algorithms, maximizing the trade-off between the false
positive and true positive rates (Area Under the ROC curve). This
approach uses a particular representation for the word collocations,
namely the vector of values corresponding to the standard statistical
interestingness measures attached to this collocation. As this
representation is general (over corpora and natural languages),
generality tests were performed by experimenting the ranking
function learned from an English corpus in Biology, onto a French
corpus of Curriculum Vitae, and vice versa, showing a good
robustness of the approaches compared to the state-of-the-art Support
Vector Machine (SVM).
Abstract: With the extensive inclusion of document, especially
text, in the business systems, data mining does not cover the full
scope of Business Intelligence. Data mining cannot deliver its impact
on extracting useful details from the large collection of unstructured
and semi-structured written materials based on natural languages.
The most pressing issue is to draw the potential business intelligence
from text. In order to gain competitive advantages for the business, it
is necessary to develop the new powerful tool, text mining, to expand
the scope of business intelligence.
In this paper, we will work out the strong points of text mining in
extracting business intelligence from huge amount of textual
information sources within business systems. We will apply text
mining to each stage of Business Intelligence systems to prove that
text mining is the powerful tool to expand the scope of BI. After
reviewing basic definitions and some related technologies, we will
discuss the relationship and the benefits of these to text mining. Some
examples and applications of text mining will also be given. The
motivation behind is to develop new approach to effective and
efficient textual information analysis. Thus we can expand the scope
of Business Intelligence using the powerful tool, text mining.