Margin-Based Feed-Forward Neural Network Classifiers

Margin-Based Principle has been proposed for a long time, it has been proved that this principle could reduce the structural risk and improve the performance in both theoretical and practical aspects. Meanwhile, feed-forward neural network is a traditional classifier, which is very hot at present with a deeper architecture. However, the training algorithm of feed-forward neural network is developed and generated from Widrow-Hoff Principle that means to minimize the squared error. In this paper, we propose a new training algorithm for feed-forward neural networks based on Margin-Based Principle, which could effectively promote the accuracy and generalization ability of neural network classifiers with less labelled samples and flexible network. We have conducted experiments on four UCI open datasets and achieved good results as expected. In conclusion, our model could handle more sparse labelled and more high-dimension dataset in a high accuracy while modification from old ANN method to our method is easy and almost free of work.




References:
[1] J. Moody, S. Hanson, A. Krogh, and J. A. Hertz, “A simple weight decay
can improve generalization,” Advances in neural information processing
systems, vol. 4, pp. 950–957, 1995.
[2] G.-B. Huang, P. Saratchandran, and N. Sundararajan, “A generalized
growing and pruning rbf (ggap-rbf) neural network for function
approximation,” Neural Networks, IEEE Transactions on, vol. 16, no. 1,
pp. 57–67, 2005.
[3] Y. Bengio, “Learning deep architectures for ai,” Foundations and
trends R  in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
[4] R. Collobert and J. Weston, “A unified architecture for natural
language processing: Deep neural networks with multitask learning,” in
Proceedings of the 25th international conference on Machine learning,
pp. 160–167, ACM, 2008.
[5] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural
networks for image classification,” in Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on, pp. 3642–3649, IEEE,
2012.
[6] G. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for
deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554,
2006.
[7] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy
layer-wise training of deep networks,” Advances in neural information
processing systems, vol. 19, p. 153, 2007.
[8] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in International Conference on Artificial
Intelligence and Statistics, pp. 249–256, 2010.
[9] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Exploring
strategies for training deep neural networks,” The Journal of Machine
Learning Research, vol. 10, pp. 1–40, 2009.
[10] B. T. C. G. D. Roller, “Max-margin markov networks,” Advances in
neural information processing systems, vol. 16, p. 25, 2004.
[11] G. Chechik, G. Heitz, G. Elidan, P. Abbeel, and D. Koller, “Max-margin
classification of data with absent features,” The Journal of Machine
Learning Research, vol. 9, pp. 1–21, 2008.
[12] R. Gilad-Bachrach, A. Navot, and N. Tishby, “Margin based feature
selection-theory and algorithms,” in Proceedings of the twenty-first
international conference on Machine learning, p. 43, ACM, 2004.
[13] B. Li, M. Chi, J. Fan, and X. Xue, “Support cluster machine,” in
Proceedings of the 24th international conference on Machine learning,
pp. 505–512.
[14] T. N. Huynh and R. J. Mooney, “Online max-margin weight learning
for markov logic networks.,” in SDM, pp. 642–651, 2011.
[15] M. Hoai and F. De la Torre, “Max-margin early event detectors,”
International Journal of Computer Vision, vol. 107, no. 2, pp. 191–202,
2014.