An Improved Learning Algorithm based on the Conjugate Gradient Method for Back Propagation Neural Networks
The conjugate gradient optimization algorithm
usually used for nonlinear least squares is presented and is
combined with the modified back propagation algorithm yielding
a new fast training multilayer perceptron (MLP) algorithm
(CGFR/AG). The approaches presented in the paper consist of
three steps: (1) Modification on standard back propagation
algorithm by introducing gain variation term of the activation
function, (2) Calculating the gradient descent on error with
respect to the weights and gains values and (3) the determination
of the new search direction by exploiting the information
calculated by gradient descent in step (2) as well as the previous
search direction. The proposed method improved the training
efficiency of back propagation algorithm by adaptively modifying
the initial search direction. Performance of the proposed method
is demonstrated by comparing to the conjugate gradient algorithm
from neural network toolbox for the chosen benchmark. The
results show that the number of iterations required by the
proposed method to converge is less than 20% of what is required
by the standard conjugate gradient and neural network toolbox
algorithm.
[1] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning internal
representations by error propagation. in D.E. Rumelhart and J.L.
McClelland (eds), Parallel Distributed Processing, 1986. 1: p. 318-
362.
[2] Marco Gori and Alberto Tesi, On the problem of local minima in
back-propagation. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 1992. 14(1): p. 76-86.
[3] E.K. Blum, Approximation of Boolean functions by sigmoidal
networks: Part I: XOR and other two-variable functions. Neural
Computation, 1989. 1(4): p. 532-540.
[4] A. van Ooyen and B. Nienhuis, Improving the convergence of the
back-propagation algorithm. Neural Networks, 1992. 5: p. 465-471.
[5] M. Ahmad and F.M.A. Salam, Supervised learning using the cauchy
energy function. International Conference on Fuzzy Logic and
Neural Networks, 1992.
[6] Pravin Chandra and Yogesh Singh, An activation function adapting
training algorithm for sigmoidal feedforward networks.
Neurocomputing, 2004. 61: p. 429-437.
[7] R.A. Jacobs, Increased rates of convergence through learning rate
adaptation. Neural Networks, 1988. 1: p. 295-307.
[8] M. K. Weir, A method for self-determination of adaptive learning
rates in back propagation. Neural Networks, 1991. 4: p. 371-379.
[9] X. H. Yu, G.A. Chen, and S.X. Cheng, Acceleration of
backpropagation learning using optimized learning rate and
momentum. Electronics Letters, 1993. 29(14): p. 1288-1289.
[10] Bishop C. M., Neural Networks for Pattern Recognition. 1995:
Oxford University Press.
[11] R. Fletcher and M. J. D. Powell, A rapidly convergent descent
method for nlinimization. British Computer J., 1963: p. 163-168.
[12] Fletcher R. and Reeves R. M., Function minimization by conjugate
gradients. Comput. J., 1964. 7(2): p. 149-160.
[13] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for
solving linear systerns. J. Research NBS, 1952. 49: p. 409.
[14] Huang H.Y., A unified approach to quadratically convergent
algorithms for function minimization. J. Optim. Theory Appl., 1970.
5: p. 405-423.
[15] Thimm G., Moerland F., and Emile Fiesler, The Interchangeability
of Learning Rate an Gain in Back propagation Neural Networks.
Neural Computation, 1996. 8(2): p. 451-460.
[16] Holger R. M. and Graeme C. D., The Effect of Internal Parameters
and Geometry on the Performance of Back-Propagation Neural
Networks. Environmental Modeling and Software, 1998. 13(1): p.
193-209.
[17] Eom K. and Jung K., Performance Improvement of Back
propagation algorithm by automatic activation function gain tuning
using fuzzy logic. Neurocomputing, 2003. 50: p. 439-460.
[18] Rumelhart D. E., Hinton G. E., and Williams R. J., Learning internal
representations by back-propagation errors. Parallel Distributed
Processing, 1986. 1 (Rumelhart D.E. et al. Eds.): p. 318-362.
[19] L. Prechelt, Proben1 - A set of Neural Network Bencmark Problems
and Benchmarking Rules. Technical Report 21/94, 1994: p. 1-38.
[20] Fisher R.A., The use of multiple measurements in taxonomic
problems. Annals of Eugenics, 1936. 7: p. 179 -188.
[21] Erik Hjelmas and P.W. Munro, A comment on parity problem.
Technical Report, 1999: p. 1-7.
[22] Mangasarian O. L. and W.W. H., Cancer diagnosis via linear
programming. SIAM News, 1990. 23(5): p. 1-18.
[23] Lutz Prechelt, ftp://ftp.ira.uka.de/pub/neuron/proben1.tar.gz. 1994.
[24] R. A. Fisher, ftp://ftp.ics.uci.edu/pub/machinelearningdatabases/
iris/iris.data. 1988.
[1] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning internal
representations by error propagation. in D.E. Rumelhart and J.L.
McClelland (eds), Parallel Distributed Processing, 1986. 1: p. 318-
362.
[2] Marco Gori and Alberto Tesi, On the problem of local minima in
back-propagation. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 1992. 14(1): p. 76-86.
[3] E.K. Blum, Approximation of Boolean functions by sigmoidal
networks: Part I: XOR and other two-variable functions. Neural
Computation, 1989. 1(4): p. 532-540.
[4] A. van Ooyen and B. Nienhuis, Improving the convergence of the
back-propagation algorithm. Neural Networks, 1992. 5: p. 465-471.
[5] M. Ahmad and F.M.A. Salam, Supervised learning using the cauchy
energy function. International Conference on Fuzzy Logic and
Neural Networks, 1992.
[6] Pravin Chandra and Yogesh Singh, An activation function adapting
training algorithm for sigmoidal feedforward networks.
Neurocomputing, 2004. 61: p. 429-437.
[7] R.A. Jacobs, Increased rates of convergence through learning rate
adaptation. Neural Networks, 1988. 1: p. 295-307.
[8] M. K. Weir, A method for self-determination of adaptive learning
rates in back propagation. Neural Networks, 1991. 4: p. 371-379.
[9] X. H. Yu, G.A. Chen, and S.X. Cheng, Acceleration of
backpropagation learning using optimized learning rate and
momentum. Electronics Letters, 1993. 29(14): p. 1288-1289.
[10] Bishop C. M., Neural Networks for Pattern Recognition. 1995:
Oxford University Press.
[11] R. Fletcher and M. J. D. Powell, A rapidly convergent descent
method for nlinimization. British Computer J., 1963: p. 163-168.
[12] Fletcher R. and Reeves R. M., Function minimization by conjugate
gradients. Comput. J., 1964. 7(2): p. 149-160.
[13] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for
solving linear systerns. J. Research NBS, 1952. 49: p. 409.
[14] Huang H.Y., A unified approach to quadratically convergent
algorithms for function minimization. J. Optim. Theory Appl., 1970.
5: p. 405-423.
[15] Thimm G., Moerland F., and Emile Fiesler, The Interchangeability
of Learning Rate an Gain in Back propagation Neural Networks.
Neural Computation, 1996. 8(2): p. 451-460.
[16] Holger R. M. and Graeme C. D., The Effect of Internal Parameters
and Geometry on the Performance of Back-Propagation Neural
Networks. Environmental Modeling and Software, 1998. 13(1): p.
193-209.
[17] Eom K. and Jung K., Performance Improvement of Back
propagation algorithm by automatic activation function gain tuning
using fuzzy logic. Neurocomputing, 2003. 50: p. 439-460.
[18] Rumelhart D. E., Hinton G. E., and Williams R. J., Learning internal
representations by back-propagation errors. Parallel Distributed
Processing, 1986. 1 (Rumelhart D.E. et al. Eds.): p. 318-362.
[19] L. Prechelt, Proben1 - A set of Neural Network Bencmark Problems
and Benchmarking Rules. Technical Report 21/94, 1994: p. 1-38.
[20] Fisher R.A., The use of multiple measurements in taxonomic
problems. Annals of Eugenics, 1936. 7: p. 179 -188.
[21] Erik Hjelmas and P.W. Munro, A comment on parity problem.
Technical Report, 1999: p. 1-7.
[22] Mangasarian O. L. and W.W. H., Cancer diagnosis via linear
programming. SIAM News, 1990. 23(5): p. 1-18.
[23] Lutz Prechelt, ftp://ftp.ira.uka.de/pub/neuron/proben1.tar.gz. 1994.
[24] R. A. Fisher, ftp://ftp.ics.uci.edu/pub/machinelearningdatabases/
iris/iris.data. 1988.
@article{"International Journal of Information, Control and Computer Sciences:49585", author = "N. M. Nawi and M. R. Ransing and R. S. Ransing", title = "An Improved Learning Algorithm based on the Conjugate Gradient Method for Back Propagation Neural Networks", abstract = "The conjugate gradient optimization algorithm
usually used for nonlinear least squares is presented and is
combined with the modified back propagation algorithm yielding
a new fast training multilayer perceptron (MLP) algorithm
(CGFR/AG). The approaches presented in the paper consist of
three steps: (1) Modification on standard back propagation
algorithm by introducing gain variation term of the activation
function, (2) Calculating the gradient descent on error with
respect to the weights and gains values and (3) the determination
of the new search direction by exploiting the information
calculated by gradient descent in step (2) as well as the previous
search direction. The proposed method improved the training
efficiency of back propagation algorithm by adaptively modifying
the initial search direction. Performance of the proposed method
is demonstrated by comparing to the conjugate gradient algorithm
from neural network toolbox for the chosen benchmark. The
results show that the number of iterations required by the
proposed method to converge is less than 20% of what is required
by the standard conjugate gradient and neural network toolbox
algorithm.", keywords = "Back-propagation, activation function, conjugategradient, search direction, gain variation.", volume = "2", number = "8", pages = "2567-5", }