Abstract: Recently, numerous documents including large
volumes of unstructured data and text have been created because of the
rapid increase in the use of social media and the Internet. Usually,
these documents are categorized for the convenience of users. Because
the accuracy of manual categorization is not guaranteed, and such
categorization requires a large amount of time and incurs huge costs.
Many studies on automatic categorization have been conducted to help
mitigate the limitations of manual categorization. Unfortunately, most
of these methods cannot be applied to categorize complex documents
with multiple topics because they work on the assumption that
individual documents can be categorized into single categories only.
Therefore, to overcome this limitation, some studies have attempted to
categorize each document into multiple categories. However, the
learning process employed in these studies involves training using a
multi-categorized document set. These methods therefore cannot be
applied to the multi-categorization of most documents unless
multi-categorized training sets using traditional multi-categorization
algorithms are provided. To overcome this limitation, in this study, we
review our novel methodology for extending the category of a
single-categorized document to multiple categorizes, and then
introduce a survey-based verification scenario for estimating the
accuracy of our automatic categorization methodology.