Impovement of a Label Extraction Method for a Risk Search System
This paper proposes an improvement method of classification
efficiency in a classification model. The model is used
in a risk search system and extracts specific labels from articles
posted at bulletin board sites. The system can analyze the important
discussions composed of the articles. The improvement method
introduces ensemble learning methods that use multiple classification
models. Also, it introduces expressions related to the specific labels
into generation of word vectors. The paper applies the improvement
method to articles collected from three bulletin board sites selected
by users and verifies the effectiveness of the improvement method.
[1] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp.
123-140, 1996.
[2] A. Esuli and F. Sebastiani, "SENTIWORDNET: A Publicly Available
Lexical Resource for Opinion Mining," Proc. 5th Conf. on Language
Resources and Evaluation, 2006, Genoa, Italy, pp. 417-422.
[3] Y. Freund, "Boosting a Weak Learning Algorithm by Majority," Information
and Computation, vol. 121, no. 2, pp. 256-285, 1995.
[4] Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of
On-Line Learning and an Application to Boosting," J. of Computer and
System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[5] C. -W. Hsu, C. -C. Chang, and C. -J. Lin, "A
Practical Guide to Support Vector Classification,"
http://www.csie.ntu.edu.tw/˜cjlin/papers/guide/guide.pdf, 2008.
[6] M. Hu and B. Liu, "Mining and Summarizing Customer Reviews," Proc.
10th Intl. Conf. on Knowledge Discovery and Data Mining, 2004, Seattle,
Washington, USA, pp. 168-177.
[7] N. Kobayashi, R. Iida, K. Inui, and Y. Matsumoto, "Opinion Extraction
Using a Learning-Based Anaphora Resolution Technique," Proc. 2nd Intl.
Joint Conf. on Natural Language Processing, 2005, Jeju Island, Korea,
pp. 175-180.
[8] G. A. Miller, C. Fellbaum, R. Tengi, P. Wakefield, H. Langone, and
B. R. Haskell, "WordNet," http://wordnet.princeton.edu/, 2006.
[9] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima, "Mining
Product Reputations on the Web," Proc. 8th Intl. Conf. on Knowledge
Discovery and Data Mining, 2002, Edmonton, Alberta, Canada, pp. 341-
349.
[10] S. Sakurai and R. Orihara, "Discovery of Important Threads from
Bulletin Board Sites," Intl. J. of Information Technology and Intelligent
Computing, vol. 1, no. 1, pp. 217-228, 2006.
[11] S. Sakurai and R. Orihara, "Discovery of Important Threads using
Thread Analysis Reports," Proc. 2006 IADIS Intl. Conf. of WWW/Internet
2006, 2006, Murcia, Spain, vol. 2, pp. 243-248.
[12] S. Sakurai, "A Risk Analysis Method using Textual Data on Bulletin
Board Sites," Proc. 8th Intl. Sympo. on advanced Intelligent Systems,
2007, Sokcho, Korea, pp. 99-102.
[13] G. Salton and M. J. McGill, "Introduction to Modern Information
Retrieval," McGraw Hill Computer Science Series, 1983.
[14] V. N. Vapnik, "The Nature of Statistical Learning Theory," Springer,
1995.
[1] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp.
123-140, 1996.
[2] A. Esuli and F. Sebastiani, "SENTIWORDNET: A Publicly Available
Lexical Resource for Opinion Mining," Proc. 5th Conf. on Language
Resources and Evaluation, 2006, Genoa, Italy, pp. 417-422.
[3] Y. Freund, "Boosting a Weak Learning Algorithm by Majority," Information
and Computation, vol. 121, no. 2, pp. 256-285, 1995.
[4] Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of
On-Line Learning and an Application to Boosting," J. of Computer and
System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[5] C. -W. Hsu, C. -C. Chang, and C. -J. Lin, "A
Practical Guide to Support Vector Classification,"
http://www.csie.ntu.edu.tw/˜cjlin/papers/guide/guide.pdf, 2008.
[6] M. Hu and B. Liu, "Mining and Summarizing Customer Reviews," Proc.
10th Intl. Conf. on Knowledge Discovery and Data Mining, 2004, Seattle,
Washington, USA, pp. 168-177.
[7] N. Kobayashi, R. Iida, K. Inui, and Y. Matsumoto, "Opinion Extraction
Using a Learning-Based Anaphora Resolution Technique," Proc. 2nd Intl.
Joint Conf. on Natural Language Processing, 2005, Jeju Island, Korea,
pp. 175-180.
[8] G. A. Miller, C. Fellbaum, R. Tengi, P. Wakefield, H. Langone, and
B. R. Haskell, "WordNet," http://wordnet.princeton.edu/, 2006.
[9] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima, "Mining
Product Reputations on the Web," Proc. 8th Intl. Conf. on Knowledge
Discovery and Data Mining, 2002, Edmonton, Alberta, Canada, pp. 341-
349.
[10] S. Sakurai and R. Orihara, "Discovery of Important Threads from
Bulletin Board Sites," Intl. J. of Information Technology and Intelligent
Computing, vol. 1, no. 1, pp. 217-228, 2006.
[11] S. Sakurai and R. Orihara, "Discovery of Important Threads using
Thread Analysis Reports," Proc. 2006 IADIS Intl. Conf. of WWW/Internet
2006, 2006, Murcia, Spain, vol. 2, pp. 243-248.
[12] S. Sakurai, "A Risk Analysis Method using Textual Data on Bulletin
Board Sites," Proc. 8th Intl. Sympo. on advanced Intelligent Systems,
2007, Sokcho, Korea, pp. 99-102.
[13] G. Salton and M. J. McGill, "Introduction to Modern Information
Retrieval," McGraw Hill Computer Science Series, 1983.
[14] V. N. Vapnik, "The Nature of Statistical Learning Theory," Springer,
1995.
@article{"International Journal of Information, Control and Computer Sciences:56440", author = "Shigeaki Sakurai and Ryohei Orihara", title = "Impovement of a Label Extraction Method for a Risk Search System", abstract = "This paper proposes an improvement method of classification
efficiency in a classification model. The model is used
in a risk search system and extracts specific labels from articles
posted at bulletin board sites. The system can analyze the important
discussions composed of the articles. The improvement method
introduces ensemble learning methods that use multiple classification
models. Also, it introduces expressions related to the specific labels
into generation of word vectors. The paper applies the improvement
method to articles collected from three bulletin board sites selected
by users and verifies the effectiveness of the improvement method.", keywords = "Text mining, Risk search system, Corporate reputation,
Bulletin board site, Ensemble learning", volume = "3", number = "3", pages = "701-8", }