Analysis of Textual Data Based On Multiple 2-Class Classification Models

This paper proposes a new method for analyzing textual data. The method deals with items of textual data, where each item is described based on various viewpoints. The method acquires 2- class classification models of the viewpoints by applying an inductive learning method to items with multiple viewpoints. The method infers whether the viewpoints are assigned to the new items or not by using the models. The method extracts expressions from the new items classified into the viewpoints and extracts characteristic expressions corresponding to the viewpoints by comparing the frequency of expressions among the viewpoints. This paper also applies the method to questionnaire data given by guests at a hotel and verifies its effect through numerical experiments.





References:
[1] A. Cardoso-Cachopo and A. L. Oliveira, "An Empirical Comparison of
Text Categorization Methods," Proc. of the 10th Intl. Sympo. on String
Processing and Information Retrieval, 2003, Manaus, Brazil, pp. 183-196.
[2] R. Feldman and H. Hirsh, "Mining Text using Keyword Distributions,"
Journal of Intelligent Information Systems, vol. 10, no. 3, pp. 281-300,
1998.
[3] M. A. Hearst, "Untangling Text Data Mining," Proc. of the 37th Annual
Meeting of the Association for Computational Linguistics, 1999, Montreal,
Canada, pp. 20-26.
[4] C. -W. Hsu, C. -C. Chang, and C. -J. Lin, "A
Practical Guide to Support Vector Classification,"
http://www.csie.ntu.edu.tw/˜cjlin/papers/guide/guide.pdf, 2003.
[5] Y. Ichimura, Y. Nakayama, M. Miyoshi, T. Akahane, T. Sekiguchi, and
Y. Fujiwara, "Text Mining System for Analysis of a Salesperson-s Daily
Reports," Proc. of the Pacific Association for Computational Linguistics
2001, 2001, Kitakyushu, Japan, pp. 127-135.
[6] A. Ittycheriah, M. Franz, W. -J. Zhu, and A. Ratnaparkhi, "IBM-s
Statistical Question Answering System," Proc. of the 9th Text Retrieval
Conf. 2000, Gaithersburg, Maryland, USA, pp. 229-234.
[7] T. Joachims, "Text Categorization with Support Vector Machines: Learning
with Many Relevant Features," Proc. of the 10th European Conf.
on Machine Learning, 1998, Dorint-Parkhotel, Chemnitz, Germany, pp.
137-142.
[8] T. Joachims, "Transductive Inference for Text Classification using Support
Vector Machines," Proc. of the 16th Intl. Conf. on Machine Learning,
1999, Bled, Slovenia, pp. 27-30.
[9] S. Sakurai, Y. Ichimura, A. Suyama, and R. Orihara, "Acquisition of
a Knowledge Dictionary for a Text Mining System using an Inductive
Learning Method," Proc. of the IJCAI 2001 Workshop on Text Learning:
Beyond Supervision, 2001, Seattle, Washington, USA, pp. 45-52.
[10] S. Sakurai and A. Suyama, "An E-mail Analysis Method based on Text
Mining Techniques," Applied Soft Computing, vol. 6, no. 1, pp. 62-71,
2005.
[11] G. Salton and M. J. McGill, "Introduction to Modern Information
Retrieval," Mcgraw-Hill, New York, USA, 1983.
[12] P. -N. Tan, H. Blau, S. Harp, and R. Goldman, "Data Mining of Service
Center Call Records," Proc. of the 6th Intl. Conf. on Knowledge Discovery
and Data Mining, 2000, Boston, Massachusetts, USA, pp. 417-423.
[13] S. Tellex, B. Katz, J. Lin , and A. Fernandes, "Quantitative Evaluation
of Passage Retrieval Algorithms for Question Answering," Proc. of the
26th Intl. Conf. on Research and Development in Information Retrieval,
2003, Toronto, Canada, pp. 41-47.
[14] V. N. Vapnik, "The Nature of Statistical Learning Theory," Springer,
New York, USA, 1995.
[15] Y. Yang and X. Liu, "A Re-examination of Text Categorization Methods,"
Proc. of the 22nd Intl. Conf. on Research and Development in
Information Retrieval, 1999, Berkeley, California, USA, pp. 15-19.