Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines

This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. A simple machine learning approach is used that eliminates prior language knowledge such as part-of-speech or noun phrase tagging thereby allowing for its applicability across languages. No domain-specific knowledge is included. The accuracy measures achieved are comparable to those obtained using more complex approaches, which constitutes a motivation to investigate ways to improve the scalability of multiclass SVM in order to make the solution more practical and useable. Improving training time of multi-class SVM would make support vector machines a more viable and practical machine learning solution for real-world problems with large datasets. An initial prototype results in great improvement of the training time at the expense of memory requirements.




References:
[1] S. Abe, Support Vector Machines for Pattern Classification. London:
Springer-Verlag, 2005.
[2] E. Alpaydin, Introduction to Machine Learning. Cambridge, MA: The
MIT Press, 2004.
[3] T. Ban and S. Abe, "Spatially Chunking Support Vector Clustering
Algorithm," in Proc. of the IEEE International Joint Conference on
Neural Networks, Grenoble, France, 2004.
[4] M. Barros de Almeida, A. de Padua Braga, et al., "SVM-KM: Speeding
SVMs Learning with A Priori Cluster Selection and K-Means," in Proc.
of the 6th Brazilian Symposium on Neural Networks, 2000.
[5] K. P. Bennett and C. Campbell, "Support Vector Machines: Hype or
Hallelujah?," SIGKDD Explor. Newsl., vol. 2, pp. 1-13, 2000.
[6] D. Boley and D. Cao, "Training Support Vector Machine using Adaptive
Clustering," in Proc. of the 4th SIAM International Conference on Data
Mining, Lake Buena Vista, Florida, 2004.
[7] R. Collobert, F. Sinz, et al., "Large Scale Transductive SVMs," Journal
of Machine Learning Research, pp. 1687-1712, 2006.
[8] R. Collobert, F. Sinz, et al., "Trading Convexity for Scalability," in Proc.
of the 23rd international conference on Machine learning, Pittsburgh,
PA, 2006.
[9] K. Crammer and Y. Singer, "On the Algorithmic Implementation of
Multi-class SVMs," Journal of Machine Learning Research, vol. 2, pp.
265-292, 2001.
[10] C. Giuliano, A. Lavelli, et al., "Simple Information Extraction (SIE),"
ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica, 2005.
[11] M. S. Habib and J. Kalita, "Language and Domain-Independent Named
Entity Recognition: Experiment using SVM and High-Dimensional
Features," in Proc. of the 4th Biotechnology and Bioinformatics
Symposium (BIOT-2007), Colorado Springs, CO, 2007.
[12] C.-W. Hsu and C.-C. Lin, "A Comparison of Methods for Multi-Class
Support Vector Machines," IEEE Transactions on Neural Networks, vol.
13, pp. 415-425, 2002.
[13] T. Joachims, "Text Categorization with Support Vector Machines:
Learning with Many Relevant Features," in Proc. of the European
Conference on Machine Learning, 1998.
[14] T. Joachims, "Making Large-Scale SVM Learning Practical," in
Advances in Kernel Methods - Support Vector Learning, B. Schölkopf,
C. Burges, and A. Smola, Eds.: MIT-Press, 1999.
[15] T. Joachims, Learning to Classify Text Using Support Vector Machine.
Norwell, MA: Kluwer Academic, 2002.
[16] T. Joachims, "A Support Vector Method for Multivariate Performance
Measures," in Proc. of the International Conference on Machine
Learning (ICML), 2005.
[17] T. Joachims, "Training Linear SVMs in Linear Time," in Proc. of the
ACM Conference on Knowledge Discovery and Data Mining (KDD),
2006.
[18] V. Kecman, Learning and Soft Computing. London, UK: The MIT
Press, 2001.
[19] J. D. Kim, T. Ohta, et al., "GENIA Corpus--Semantically Annotated
Corpus for Bio-Textmining," Bioinformatics, vol. 19 Suppl 1, pp. 180-
182, 2003.
[20] J.-D. Kim, T. Ohta, et al., "Introduction to the Bio-Entity Recognition
Task at JNLPBA," in Proc. of the 2004 Joint Workshop on Natural
Language Processing in Biomedicine and its Applications
(JNLPBA'2004), Geneva, Switzerland, 2004.
[21] U. H.-G. Kreßel, "Pairwise Classification and Support Vector
Machines," in Advances in Kernel Methods: Support Vector Learning.
Cambridge, MA: MIT Press, 1999, pp. 255-268.
[22] K.-J. Lee, Y.-S. Hwang, et al., "Biomedical Named Entity Recognition
using Two-Phase Model Based on SVMs," Journal of Biomedical
Informatics, vol. 37, pp. 436-447, 2004.
[23] H. Lei and V. Govindaraju, "Half-Against-Half Multi-class Support
Vector Machines," in Proc. of the 6th International Workshop on
Multiple Classifier Systems, Seaside, CA, USA, 2005.
[24] K.-R. M├╝ller, S. Mika, et al., "An Introduction to Kernel-Based Learning
Algorithms," IEEE Transactions on Neural Networks, vol. 12, pp. 181-
120, 2001.
[25] K.-M. Park, S.-H. Kim, et al., "Incorporating Lexical Knowledge into
Biomedical NE Recognition," in Proc. of the 2004 Joint Workshop on
Natural Language Processing in Biomedicine and its Applications
(JNLPBA'2004), Geneva, Switzerland, 2004.
[26] J. C. Platt, N. Cristianini, et al., "Large Margin DAGs for Multiclass
Classification," in Advances in Neural Information Processing Systems,
vol. 12, S. A. Solla, T. K. Leen, and K.-R. M¨uller, Eds. Cambridge,
MA: MIT Press, 2000, pp. 547-553.
[27] M. Rössler, "Adapting an NER-System for German to the Biomedical
Domain," in Proc. of the 2004 Joint Workshop on Natural Language
Processing in Biomedicine and its Applications (JNLPBA'2004),
Geneva, Switzerland, 2004.
[28] T. Serafini and L. Zanni, "On the Working Set Selection in Gradient
Projection-based Decomposition Techniques for Support Vector
Machines," Optimization Methods and Software, pp. 583-596, 2005.
[29] Y. Song, E. Kim, et al., "POSBIOTM-NER in the Shared Task of
BioNLP/NLPBA 2004," in Proc. of the 2004 Joint Workshop on Natural
Language Processing in Biomedicine and its Applications
(JNLPBA'2004), Geneva, Switzerland, 2004.
[30] I. Tsochantaridis, T. Hofmann, et al., "Support Vector Learning for
Interdependent and Structured Output Spaces," in Proc. of the 21st
International Conference on Machine Learning (ICML), Alberta,
Canada, 2004.
[31] I. Tsochantaridis, T. Joachims, et al., "Large Margin Methods for
Structured and Interdependent Output Variables," Journal of Machine
Learning Research (JMLR), vol. 6, pp. 1453-1484, 2005.
[32] V. N. Vapnik, Statistical Learning Theory. New York, NY: John Wiley
& Sons, 1998.
[33] Y. Wong and H. T. Ng, "One Class per Named Entity: Exploiting
Unlabeled Text for Named Entity Recognition," in Proc. of the 20th
International Joint Conference on Artificial Intelligence (IJCAI-07),
Hyderabad, India, 2007.
[34] G. Zhou and J. Su, "Exploring Deep Knowledge Resources in
Biomedical Name Recognition," in Proc. of the 2004 Joint Workshop on
Natural Language Processing in Biomedicine and its Applications
(JNLPBA'2004), Geneva, Switzerland, 2004.