Margin-Based Principle has been proposed for a long
time, it has been proved that this principle could reduce the
structural risk and improve the performance in both theoretical
and practical aspects. Meanwhile, feed-forward neural network is
a traditional classifier, which is very hot at present with a deeper
architecture. However, the training algorithm of feed-forward neural
network is developed and generated from Widrow-Hoff Principle that
means to minimize the squared error. In this paper, we propose
a new training algorithm for feed-forward neural networks based
on Margin-Based Principle, which could effectively promote the
accuracy and generalization ability of neural network classifiers
with less labelled samples and flexible network. We have conducted
experiments on four UCI open datasets and achieved good results
as expected. In conclusion, our model could handle more sparse
labelled and more high-dimension dataset in a high accuracy while
modification from old ANN method to our method is easy and almost
free of work.
[1] J. Moody, S. Hanson, A. Krogh, and J. A. Hertz, “A simple weight decay
can improve generalization,” Advances in neural information processing
systems, vol. 4, pp. 950–957, 1995.
[2] G.-B. Huang, P. Saratchandran, and N. Sundararajan, “A generalized
growing and pruning rbf (ggap-rbf) neural network for function
approximation,” Neural Networks, IEEE Transactions on, vol. 16, no. 1,
pp. 57–67, 2005.
[3] Y. Bengio, “Learning deep architectures for ai,” Foundations and
trends R in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
[4] R. Collobert and J. Weston, “A unified architecture for natural
language processing: Deep neural networks with multitask learning,” in
Proceedings of the 25th international conference on Machine learning,
pp. 160–167, ACM, 2008.
[5] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural
networks for image classification,” in Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on, pp. 3642–3649, IEEE,
2012.
[6] G. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for
deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554,
2006.
[7] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy
layer-wise training of deep networks,” Advances in neural information
processing systems, vol. 19, p. 153, 2007.
[8] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in International Conference on Artificial
Intelligence and Statistics, pp. 249–256, 2010.
[9] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Exploring
strategies for training deep neural networks,” The Journal of Machine
Learning Research, vol. 10, pp. 1–40, 2009.
[10] B. T. C. G. D. Roller, “Max-margin markov networks,” Advances in
neural information processing systems, vol. 16, p. 25, 2004.
[11] G. Chechik, G. Heitz, G. Elidan, P. Abbeel, and D. Koller, “Max-margin
classification of data with absent features,” The Journal of Machine
Learning Research, vol. 9, pp. 1–21, 2008.
[12] R. Gilad-Bachrach, A. Navot, and N. Tishby, “Margin based feature
selection-theory and algorithms,” in Proceedings of the twenty-first
international conference on Machine learning, p. 43, ACM, 2004.
[13] B. Li, M. Chi, J. Fan, and X. Xue, “Support cluster machine,” in
Proceedings of the 24th international conference on Machine learning,
pp. 505–512.
[14] T. N. Huynh and R. J. Mooney, “Online max-margin weight learning
for markov logic networks.,” in SDM, pp. 642–651, 2011.
[15] M. Hoai and F. De la Torre, “Max-margin early event detectors,”
International Journal of Computer Vision, vol. 107, no. 2, pp. 191–202,
2014.
[1] J. Moody, S. Hanson, A. Krogh, and J. A. Hertz, “A simple weight decay
can improve generalization,” Advances in neural information processing
systems, vol. 4, pp. 950–957, 1995.
[2] G.-B. Huang, P. Saratchandran, and N. Sundararajan, “A generalized
growing and pruning rbf (ggap-rbf) neural network for function
approximation,” Neural Networks, IEEE Transactions on, vol. 16, no. 1,
pp. 57–67, 2005.
[3] Y. Bengio, “Learning deep architectures for ai,” Foundations and
trends R in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
[4] R. Collobert and J. Weston, “A unified architecture for natural
language processing: Deep neural networks with multitask learning,” in
Proceedings of the 25th international conference on Machine learning,
pp. 160–167, ACM, 2008.
[5] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural
networks for image classification,” in Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on, pp. 3642–3649, IEEE,
2012.
[6] G. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for
deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554,
2006.
[7] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy
layer-wise training of deep networks,” Advances in neural information
processing systems, vol. 19, p. 153, 2007.
[8] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in International Conference on Artificial
Intelligence and Statistics, pp. 249–256, 2010.
[9] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Exploring
strategies for training deep neural networks,” The Journal of Machine
Learning Research, vol. 10, pp. 1–40, 2009.
[10] B. T. C. G. D. Roller, “Max-margin markov networks,” Advances in
neural information processing systems, vol. 16, p. 25, 2004.
[11] G. Chechik, G. Heitz, G. Elidan, P. Abbeel, and D. Koller, “Max-margin
classification of data with absent features,” The Journal of Machine
Learning Research, vol. 9, pp. 1–21, 2008.
[12] R. Gilad-Bachrach, A. Navot, and N. Tishby, “Margin based feature
selection-theory and algorithms,” in Proceedings of the twenty-first
international conference on Machine learning, p. 43, ACM, 2004.
[13] B. Li, M. Chi, J. Fan, and X. Xue, “Support cluster machine,” in
Proceedings of the 24th international conference on Machine learning,
pp. 505–512.
[14] T. N. Huynh and R. J. Mooney, “Online max-margin weight learning
for markov logic networks.,” in SDM, pp. 642–651, 2011.
[15] M. Hoai and F. De la Torre, “Max-margin early event detectors,”
International Journal of Computer Vision, vol. 107, no. 2, pp. 191–202,
2014.
@article{"International Journal of Information, Control and Computer Sciences:70007", author = "Han Xiao and Xiaoyan Zhu", title = "Margin-Based Feed-Forward Neural Network Classifiers", abstract = "Margin-Based Principle has been proposed for a long
time, it has been proved that this principle could reduce the
structural risk and improve the performance in both theoretical
and practical aspects. Meanwhile, feed-forward neural network is
a traditional classifier, which is very hot at present with a deeper
architecture. However, the training algorithm of feed-forward neural
network is developed and generated from Widrow-Hoff Principle that
means to minimize the squared error. In this paper, we propose
a new training algorithm for feed-forward neural networks based
on Margin-Based Principle, which could effectively promote the
accuracy and generalization ability of neural network classifiers
with less labelled samples and flexible network. We have conducted
experiments on four UCI open datasets and achieved good results
as expected. In conclusion, our model could handle more sparse
labelled and more high-dimension dataset in a high accuracy while
modification from old ANN method to our method is easy and almost
free of work.", keywords = "Max-Margin Principle, Feed-Forward Neural Network,
Classifier.", volume = "9", number = "5", pages = "1290-6", }