On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel

This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.





References:
[1] A. Kless and T. Eitrich, "Cytochrome p450 classification of drugs with
support vector machines implementing the nearest point algorithm,"
in Knowledge Exploration in Life Science Informatics, International
Symposium, KELSI 2004, Milan, Italy, ser. Lecture Notes in Computer
Science, J. A. L'opez, E. Benfenati, and W. Dubitzky, Eds., vol. 3303.
Springer, 2004, pp. 191-205.
[2] T. Joachims, "Text categorization with support vector machines: learning
with many relevant features," in Proceedings of ECML-98, 10th European
Conference on Machine Learning. Chemnitz, Germany: Springer,
1998, pp. 137-142.
[3] B. Sch¨olkopf, "The kernel trick for distances," in NIPS, 2000, pp. 301-
307.
[4] E. Osuna, R. Freund, and F. Girosi, "Support vector machines: training
and applications," Massachusetts Institute of Technology, AI Memo
1602, 1997.
[5] V. N. Vapnik, Statistical learning theory. New York: John Wiley &
Sons, 1998.
[6] C.-J. Lin, "On the convergence of the decomposition method for support
vector machines," IEEE Transactions on Neural Networks, vol. 12, no. 6,
pp. 1288-1298, 2001.
[7] M. Momma and K. P. Bennett, "A pattern search method for model
selection of support vector regression," in Proceedings of the Second
SIAM International Conference on Data Mining, Arlington, VA, USA,
April 11-13, 2002, R. L. Grossman, J. Han, V. Kumar, H. Mannila, and
R. Motwani, Eds. SIAM, 2002.
[8] C. Staelin, "Parameter selection for support vector machines," HP
Laboratories, Israel, Technical Report HPL-2002-354, 2002.
[9] H. Nunez, C. Angulo, and A. Catala, "Support vector machines
with symbolic interpretation," VII Simpsio Brasileiro de Redes
Neurais - Recife, PE, Brasil, pp. 142-147, 2002. (Online). Available:
http://doi.ieeecomputersociety.org/10.1109/SBRN.2002.1181456
[10] H. P. Graf, E. Cosatto, L. Bottou, I. Dourdanovic, and V. Vapnik,
"Parallel support vector machines: the cascade SVM," in Advances in
Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and
L. Bottou, Eds. Cambridge, MA: MIT Press, 2005, pp. 521-528.
[11] V. Ruggiero and L. Zanni, "On the efficiency of splitting and projection
methods for large strictly convex quadratic programs," Applied Optimization,
pp. 401-413, 1999.
[12] H. Yu, J. Yang, and J. Han, "Classifying large data sets using SVMs
with hierarchical clusters," in Proceedings of the Ninth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining,
Washington, DC, USA, August 24 - 27, 2003, L. Getoor, T. E. Senator,
P. Domingos, and C. Faloutsos, Eds. ACM, 2003, pp. 306-315.
[13] S. S. Keerthi, "Efficient tuning of SVM hyperparameters using radius/
margin bound and iterative algorithms," IEEE Transactions on
Neural Networks, vol. 13, pp. 1225-1229, 2002.
[14] R. Collobert, S. Bengio, and Y. Bengio, "A parallel mixture of SVMs
for very large scale problems," Neural Computation, vol. 14, no. 5, pp.
1105-1114, 2002.
[15] C. Hsu and C. Lin, "A comparison of methods for multi-class support
vector machines," IEEE Transactions on Neural Networks, vol. 13, pp.
415-425, 2002.
[16] T. Eitrich and B. Lang, "Parallel tuning of support vector machine learning
parameters for large and unbalanced data sets," in Computational
Life Sciences, First International Symposium, CompLife 2005, Konstanz,
Germany, ser. Lecture Notes in Computer Science, M. R. Berthold,
R. Glen, K. Diederichs, O. Kohlbacher, and I. Fischer, Eds., vol. 3695.
Springer, 2005, pp. 253-264.
[17] T. P. Runarsson and S. Sigurdsson, "Asynchronous parallel evolutionary
model selection for support vector machines," Neural Information
Processing - Letters and Reviews, vol. 3, no. 3, pp. 59-67, june 2004.
[18] S. Celis and D. R. Musicant, "Weka-parallel: machine learning in
parallel," Carleton College, Computer Science Technical Report 2002b,
2002.
[19] J.-X. Dong, A. Krzyzak, and C. Y. Suen, "A fast parallel optimization for
training support vector machines," in Proceedings of 3rd International
Conference on Machine Learning and Data Mining, P. Perner and
A. Rosenfeld, Eds., 2003, pp. 96-105.
[20] T. Serafini, G. Zanghirati, and L. Zanni, "Parallel decomposition approaches
for training support vector machines," in Proceedings of the
International Conference ParCo2003, Dresden, Germany. Elsevier,
2004, pp. 259-266.
[21] N. Cristianini and J. Shawe-Taylor, An introduction to support vector
machines and other kernel-based learning methods. Cambridge, UK:
Cambridge University Press, 2000.
[22] V. N. Vapnik, The nature of statistical learning theory. New York:
Springer, 1995.
[23] B. Sch¨olkopf and A. J. Smola, Learning with kernels. Cambridge, MA:
MIT Press, 2002.
[24] R. Herbrich, Learning kernel classifiers: theory and algorithms. Cambridge,
MA, USA: MIT Press, 2001.
[25] T. Eitrich and B. Lang, "Efficient optimization of support vector machine
learning parameters for unbalanced datasets," Journal of Computational
and Applied Mathematics, 2005, in press.
[26] C.-C. Chang and C.-J. Lin, LIBSVM: a library for
support vector machines, 2001, software available at
http://www.csie.ntu.edu.tw/˜cjlin/libsvm.
[27] R. Collobert and S. Bengio, "SVMTorch: a support vector machine for
large-scale regression and classification problems," Journal of Machine
Learning Research, vol. 1, pp. 143-160, 2001. (Online). Available:
http://www.idiap.ch/learning/SVMTorch.html
[28] T. Joachims, "SVM-light support vector machine," Webpage,
Mar 2002, http://www.cs.cornell.edu/People/tj/svm light/,
Version: 5.00, Date: 03.07.2002. (Online). Available:
http://www.cs.cornell.edu/People/tj/svm light
[29] S. R├╝ping, "mySVM-manual," Webpage, 2000. (Online). Available:
http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM
[30] C. J. C. Burges, "A tutorial on support vector machines for pattern
recognition," Data Mining and Knowledge Discovery, vol. 2, no. 2, pp.
121-167, 1998.
[31] T. Eitrich and B. Lang, "Shared memory parallel support vector machine
learning," Research Centre J¨ulich, Preprint FZJ-ZAM-IB-2005-11,
September 2005, submitted for publication.
[32] P. Laskov, "Feasible direction decomposition algorithms for training
support vector machines," Machine Learning, vol. 46, no. 1-3, pp. 315-
349, 2002.
[33] S. Leyffer, "The return of the active set method," 2005, to appear in
Oberwolfach Report.
[34] T. Serafini, G. Zanghirati, and L. Zanni, "Gradient projection methods
for quadratic programs and applications in training support vector
machines," Optimization Methods and Software, vol. 20, no. 2-3, pp.
353-378, 2005.
[35] J. Platt, "Fast training of support vector machines using sequential
minimal optimization," in Advances in Kernel Methods ÔÇö Support
Vector Learning, B. Schölkopf, C. J. C. Burges, and A. J. Smola, Eds.
Cambridge, MA: MIT Press, 1999, pp. 185-208.
[36] S. Hettich, C. L. Blake, and C. J. Merz, UCI
repository of machine learning databases, 1998,
http://www.ics.uci.edu/~mlearn/MLRepository.html.
[37] G. Zanghirati and L. Zanni, "A parallel solver for large quadratic
programs in training support vector machines," Parallel Computing,
vol. 29, no. 4, pp. 535-551, 2003.
[38] U. Detert, Introduction to the JUMP architecture, 2004. (Online).
Available: http://jumpdoc.fz-juelich.de
[39] T. Joachims, "Making large-scale SVM learning practical," in Advances
in Kernel Methods - Support Vector Learning, B. Sch¨olkopf, C. Burges,
and A. Smola, Eds. MIT Press, 1998, pp. 169-185.
[40] O. Chapelle, V. N. Vapnik, O. Bousquet, and S. Mukherjee, "Choosing
multiple parameters for support vector machines," Machine Learning,
vol. 46, no. 1, pp. 131-159, 2002.