Impovement of a Label Extraction Method for a Risk Search System

This paper proposes an improvement method of classification efficiency in a classification model. The model is used in a risk search system and extracts specific labels from articles posted at bulletin board sites. The system can analyze the important discussions composed of the articles. The improvement method introduces ensemble learning methods that use multiple classification models. Also, it introduces expressions related to the specific labels into generation of word vectors. The paper applies the improvement method to articles collected from three bulletin board sites selected by users and verifies the effectiveness of the improvement method.




References:
[1] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp.
123-140, 1996.
[2] A. Esuli and F. Sebastiani, "SENTIWORDNET: A Publicly Available
Lexical Resource for Opinion Mining," Proc. 5th Conf. on Language
Resources and Evaluation, 2006, Genoa, Italy, pp. 417-422.
[3] Y. Freund, "Boosting a Weak Learning Algorithm by Majority," Information
and Computation, vol. 121, no. 2, pp. 256-285, 1995.
[4] Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of
On-Line Learning and an Application to Boosting," J. of Computer and
System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[5] C. -W. Hsu, C. -C. Chang, and C. -J. Lin, "A
Practical Guide to Support Vector Classification,"
http://www.csie.ntu.edu.tw/˜cjlin/papers/guide/guide.pdf, 2008.
[6] M. Hu and B. Liu, "Mining and Summarizing Customer Reviews," Proc.
10th Intl. Conf. on Knowledge Discovery and Data Mining, 2004, Seattle,
Washington, USA, pp. 168-177.
[7] N. Kobayashi, R. Iida, K. Inui, and Y. Matsumoto, "Opinion Extraction
Using a Learning-Based Anaphora Resolution Technique," Proc. 2nd Intl.
Joint Conf. on Natural Language Processing, 2005, Jeju Island, Korea,
pp. 175-180.
[8] G. A. Miller, C. Fellbaum, R. Tengi, P. Wakefield, H. Langone, and
B. R. Haskell, "WordNet," http://wordnet.princeton.edu/, 2006.
[9] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima, "Mining
Product Reputations on the Web," Proc. 8th Intl. Conf. on Knowledge
Discovery and Data Mining, 2002, Edmonton, Alberta, Canada, pp. 341-
349.
[10] S. Sakurai and R. Orihara, "Discovery of Important Threads from
Bulletin Board Sites," Intl. J. of Information Technology and Intelligent
Computing, vol. 1, no. 1, pp. 217-228, 2006.
[11] S. Sakurai and R. Orihara, "Discovery of Important Threads using
Thread Analysis Reports," Proc. 2006 IADIS Intl. Conf. of WWW/Internet
2006, 2006, Murcia, Spain, vol. 2, pp. 243-248.
[12] S. Sakurai, "A Risk Analysis Method using Textual Data on Bulletin
Board Sites," Proc. 8th Intl. Sympo. on advanced Intelligent Systems,
2007, Sokcho, Korea, pp. 99-102.
[13] G. Salton and M. J. McGill, "Introduction to Modern Information
Retrieval," McGraw Hill Computer Science Series, 1983.
[14] V. N. Vapnik, "The Nature of Statistical Learning Theory," Springer,
1995.