An Improved Learning Algorithm based on the Conjugate Gradient Method for Back Propagation Neural Networks

The conjugate gradient optimization algorithm usually used for nonlinear least squares is presented and is combined with the modified back propagation algorithm yielding a new fast training multilayer perceptron (MLP) algorithm (CGFR/AG). The approaches presented in the paper consist of three steps: (1) Modification on standard back propagation algorithm by introducing gain variation term of the activation function, (2) Calculating the gradient descent on error with respect to the weights and gains values and (3) the determination of the new search direction by exploiting the information calculated by gradient descent in step (2) as well as the previous search direction. The proposed method improved the training efficiency of back propagation algorithm by adaptively modifying the initial search direction. Performance of the proposed method is demonstrated by comparing to the conjugate gradient algorithm from neural network toolbox for the chosen benchmark. The results show that the number of iterations required by the proposed method to converge is less than 20% of what is required by the standard conjugate gradient and neural network toolbox algorithm.




References:
[1] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning internal
representations by error propagation. in D.E. Rumelhart and J.L.
McClelland (eds), Parallel Distributed Processing, 1986. 1: p. 318-
362.
[2] Marco Gori and Alberto Tesi, On the problem of local minima in
back-propagation. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 1992. 14(1): p. 76-86.
[3] E.K. Blum, Approximation of Boolean functions by sigmoidal
networks: Part I: XOR and other two-variable functions. Neural
Computation, 1989. 1(4): p. 532-540.
[4] A. van Ooyen and B. Nienhuis, Improving the convergence of the
back-propagation algorithm. Neural Networks, 1992. 5: p. 465-471.
[5] M. Ahmad and F.M.A. Salam, Supervised learning using the cauchy
energy function. International Conference on Fuzzy Logic and
Neural Networks, 1992.
[6] Pravin Chandra and Yogesh Singh, An activation function adapting
training algorithm for sigmoidal feedforward networks.
Neurocomputing, 2004. 61: p. 429-437.
[7] R.A. Jacobs, Increased rates of convergence through learning rate
adaptation. Neural Networks, 1988. 1: p. 295-307.
[8] M. K. Weir, A method for self-determination of adaptive learning
rates in back propagation. Neural Networks, 1991. 4: p. 371-379.
[9] X. H. Yu, G.A. Chen, and S.X. Cheng, Acceleration of
backpropagation learning using optimized learning rate and
momentum. Electronics Letters, 1993. 29(14): p. 1288-1289.
[10] Bishop C. M., Neural Networks for Pattern Recognition. 1995:
Oxford University Press.
[11] R. Fletcher and M. J. D. Powell, A rapidly convergent descent
method for nlinimization. British Computer J., 1963: p. 163-168.
[12] Fletcher R. and Reeves R. M., Function minimization by conjugate
gradients. Comput. J., 1964. 7(2): p. 149-160.
[13] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for
solving linear systerns. J. Research NBS, 1952. 49: p. 409.
[14] Huang H.Y., A unified approach to quadratically convergent
algorithms for function minimization. J. Optim. Theory Appl., 1970.
5: p. 405-423.
[15] Thimm G., Moerland F., and Emile Fiesler, The Interchangeability
of Learning Rate an Gain in Back propagation Neural Networks.
Neural Computation, 1996. 8(2): p. 451-460.
[16] Holger R. M. and Graeme C. D., The Effect of Internal Parameters
and Geometry on the Performance of Back-Propagation Neural
Networks. Environmental Modeling and Software, 1998. 13(1): p.
193-209.
[17] Eom K. and Jung K., Performance Improvement of Back
propagation algorithm by automatic activation function gain tuning
using fuzzy logic. Neurocomputing, 2003. 50: p. 439-460.
[18] Rumelhart D. E., Hinton G. E., and Williams R. J., Learning internal
representations by back-propagation errors. Parallel Distributed
Processing, 1986. 1 (Rumelhart D.E. et al. Eds.): p. 318-362.
[19] L. Prechelt, Proben1 - A set of Neural Network Bencmark Problems
and Benchmarking Rules. Technical Report 21/94, 1994: p. 1-38.
[20] Fisher R.A., The use of multiple measurements in taxonomic
problems. Annals of Eugenics, 1936. 7: p. 179 -188.
[21] Erik Hjelmas and P.W. Munro, A comment on parity problem.
Technical Report, 1999: p. 1-7.
[22] Mangasarian O. L. and W.W. H., Cancer diagnosis via linear
programming. SIAM News, 1990. 23(5): p. 1-18.
[23] Lutz Prechelt, ftp://ftp.ira.uka.de/pub/neuron/proben1.tar.gz. 1994.
[24] R. A. Fisher, ftp://ftp.ics.uci.edu/pub/machinelearningdatabases/
iris/iris.data. 1988.