Efficient Implementation of Serial and Parallel Support Vector Machine Training with a Multi-Parameter Kernel for Large-Scale Data Mining

This work deals with aspects of support vector learning for large-scale data mining tasks. Based on a decomposition algorithm that can be run in serial and parallel mode we introduce a data transformation that allows for the usage of an expensive generalized kernel without additional costs. In order to speed up the decomposition algorithm we analyze the problem of working set selection for large data sets and analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our modifications and settings lead to improvement of support vector learning performance and thus allow using extensive parameter search methods to optimize classification accuracy.





References:
[1] T. Joachims, "Text categorization with support vector machines: learning
with many relevant features," in Proceedings of ECML-98. Chemnitz:
Springer, 1998, pp. 137-142.
[2] V. N. Vapnik, Statistical learning theory. New York: John Wiley &
Sons, 1998.
[3] C.-J. Lin, "On the convergence of the decomposition method for support
vector machines," IEEE Transactions on Neural Networks, vol. 12, no. 6,
pp. 1288-1298, 2001.
[4] M. Momma and K. P. Bennett, "A pattern search method for model
selection of support vector regression," in Proc. of the 2nd SIAM Int.
Conf. on Data Mining, Arlington, 2002. SIAM, 2002.
[5] H. Nunez, C. Angulo, and A. Catala, "Support vector machines
with symbolic interpretation," VII Simpsio Brasileiro de Redes
Neurais - Recife, PE, Brasil, pp. 142-147, 2002. (Online). Available:
http://doi.ieeecomputersociety.org/10.1109/SBRN.2002.1181456
[6] H. P. Graf, E. Cosatto, L. Bottou, I. Dourdanovic, and V. Vapnik,
"Parallel support vector machines: the cascade SVM," in Advances in
Neural Information Processing Systems 17. Cambridge, MA: MIT
Press, 2005, pp. 521-528.
[7] V. Ruggiero and L. Zanni, "On the efficiency of splitting and projection
methods for large strictly convex quadratic programs," Applied Optimization,
pp. 401-413, 1999.
[8] S. S. Keerthi, "Efficient tuning of SVM hyperparameters using radius/
margin bound and iterative algorithms," IEEE Transactions on
Neural Networks, vol. 13, pp. 1225-1229, 2002.
[9] R. Collobert, S. Bengio, and Y. Bengio, "A parallel mixture of SVMs
for very large scale problems," Neural Computation, vol. 14, no. 5, pp.
1105-1114, 2002.
[10] T. Eitrich and B. Lang, "Parallel tuning of support vector machine
learning parameters for large and unbalanced data sets," in CompLife
2005 Symposium, Konstanz, ser. LNCS, vol. 3695. Springer, 2005, pp.
253-264.
[11] S. Celis and D. R. Musicant, "Weka-parallel: machine learning in
parallel," Carleton College, CS TR 2002b, 2002.
[12] J.-X. Dong, A. Krzyzak, and C. Y. Suen, "A fast parallel optimization for
training support vector machines," in Proc. of 3rd Int. Conf. on Machine
Learning and Data Mining, 2003, pp. 96-105.
[13] V. N. Vapnik, The nature of statistical learning theory. NewYork:
Springer, 1995.
[14] B. Sch¨olkopf and A. J. Smola, Learning with kernels. Cambridge, MA:
MIT Press, 2002.
[15] R. Herbrich, Learning kernel classifiers: theory and algorithms. Cambridge,
MA, USA: MIT Press, 2001.
[16] T. Eitrich and B. Lang, "Efficient optimization of support vector machine
learning parameters for unbalanced datasets," Journal of Computational
and Applied Mathematics, 2005, in press.
[17] C. J. C. Burges, "A tutorial on support vector machines for pattern
recognition," Data Mining and Knowledge Discovery, vol. 2, no. 2, pp.
121-167, 1998.
[18] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector
machines, 2001.
[19] R. Collobert and S. Bengio, "SVMTorch: a support vector machine for
large-scale regression and classification problems," Journal of Machine
Learning Research, vol. 1, pp. 143-160, 2001. (Online). Available:
http://www.idiap.ch/learning/SVMTorch.html
[20] T. Joachims, "SVM-light support vector machine," Webpage, 2002.
[21] T. Eitrich and B. Lang, "Shared memory parallel support vector machine
learning," Research Centre J¨ ulich, Preprint FZJ-ZAM-IB-2005-11,
September 2005, submitted for publication.
[22] P. Laskov, "Feasible direction decomposition algorithms for training
support vector machines," Machine Learning, vol. 46, no. 1-3, pp. 315-
349, 2002.
[23] S. Leyffer, "The return of the active set method," 2005, to apper.
[24] T. Serafini, G. Zanghirati, and L. Zanni, "Gradient projection methods
for quadratic programs and applications in training support vector
machines," Optimization Methods and Software, vol. 20, no. 2-3, pp.
353-378, 2005.
[25] J. Platt, "Fast training of support vector machines using sequential
minimal optimization," in Advances in Kernel Methods ÔÇö Support
Vector Learning. Cambridge, MA: MIT Press, 1999, pp. 185-208.
[26] S. Hettich, C. L. Blake, and C. J. Merz, UCI
repository of machine learning databases, 1998,
http://www.ics.uci.edu/Ôê╝mlearn/MLRepository.html.
[27] G. Zanghirati and L. Zanni, "A parallel solver for large quadratic
programs in training support vector machines," Parallel Computing,
vol. 29, no. 4, pp. 535-551, 2003.
[28] U. Detert, Introduction to the JUMP architecture, 2004. (Online).
Available: http://jumpdoc.fz-juelich.de
[29] T. Joachims, "Making large-scale SVM learning practical," in Advances
in Kernel Methods - Support Vector Learning. MIT Press, 1998, pp.
169-185.
[30] N. Cristianini and J. Shawe-Taylor, An introduction to support vector
machines and other kernel-based learning methods. Cambridge, UK:
Cambridge University Press, 2000.