An Improved Conjugate Gradient Based Learning Algorithm for Back Propagation Neural Networks

The conjugate gradient optimization algorithm is combined with the modified back propagation algorithm to yield a computationally efficient algorithm for training multilayer perceptron (MLP) networks (CGFR/AG). The computational efficiency is enhanced by adaptively modifying initial search direction as described in the following steps: (1) Modification on standard back propagation algorithm by introducing a gain variation term in the activation function, (2) Calculation of the gradient descent of error with respect to the weights and gains values and (3) the determination of a new search direction by using information calculated in step (2). The performance of the proposed method is demonstrated by comparing accuracy and computation time with the conjugate gradient algorithm used in MATLAB neural network toolbox. The results show that the computational efficiency of the proposed method was better than the standard conjugate gradient algorithm.





References:
[1] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning internal
representations by error propagation. in D.E. Rumelhart and J.L.
McClelland (eds), Parallel Distributed Processing, 1986. 1: p. 318-362.
[2] A. van Ooyen and B. Nienhuis, Improving the convergence of the backpropagation
algorithm. Neural Networks, 1992. 5: p. 465-471.
[3] M. Ahmad and F.M.A. Salam, Supervised learning using the cauchy
energy function. International Conference on Fuzzy Logic and Neural
Networks, 1992.
[4] Pravin Chandra and Yogesh Singh, An activation function adapting
training algorithm for sigmoidal feedforward networks.
Neurocomputing, 2004. 61: p. 429-437.
[5] R.A. Jacobs, Increased rates of convergence through learning rate
adaptation. Neural Networks, 1988. 1: p. 295-307.
[6] M.K. Weir, A method for self-determination of adaptive learning rates
in back propagation. Neural Networks, 1991. 4: p. 371-379.
[7] X.H. Yu, G.A. Chen, and S.X. Cheng, Acceleration of backpropagation
learning using optimized learning rate and momentum. Electronics
Letters, 1993. 29(14): p. 1288-1289.
[8] Bishop C. M., Neural Networks for Pattern Recognition. 1995: Oxford
University Press.
[9] R. Fletcher and M. J. D. Powell, A rapidly convergent descent method
for nlinimization. British Computer J., 1963: p. 163-168.
[10] Fletcher R. and Reeves R. M., Function minimization by conjugate
gradients. Comput. J., 1964. 7(2): p. 149-160.
[11] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for
solving linear systerns. J. Research NBS, 1952. 49: p. 409.
[12] HUANG H.Y., A unified approach to quadratically convergent
algorithms for function minimization. J. Optim. Theory Appl., 1970. 5:
p. 405-423.
[13] Thimm G., Moerland F., and Emile Fiesler, The Interchangeability of
Learning Rate an Gain in Back propagation Neural Networks. Neural
Computation, 1996. 8(2): p. 451-460.
[14] Holger R. M. and Graeme C. D., The Effect of Internal Parameters and
Geometry on the Performance of Back-Propagation Neural Networks.
Environmental Modeling and Software, 1998. 13(1): p. 193-209.
[15] Eom K. and Jung K., Performance Improvement of Back propagation
algorithm by automatic activation function gain tuning using fuzzy logic.
Neurocomputing, 2003. 50: p. 439-460.
[16] Rumelhart D. E., Hinton G. E., and Williams R. J., Learning internal
representations by back-propagation errors. Parallel Distributed
Processing, 1986. 1 (Rumelhart D.E. et al. Eds.): p. 318-362.
[17] C.H. Chen and Hongtao Lai, An empirical study of the Gradient Descent
and the Conjugate Gradient backpropagation neural networks. IEEE,
1992: p. 132-135.
[18] Curtis F. Gerald and Patrick O. Wheatley, Applied Numerical Analysis.
Seventh Edition. 2004: Addison-Wesley.
[19] L.Prechelt, Proben1 - A set of Neural Network Bencmark Problems and
Benchmarking Rules. Technical Report 21/94, 1994: p. 1-38.
[20] Adrian J. Sheperd, Second Order Methods for Neural Networks-Fast
and Reliable Training Methods for Multi-layer Perceptrons, ed. J.G.
Taylor. 1997: Springer. 143.
[21] Dave Watkins, Clementine's Neural Networks Technical Overview.
Technical Report, 1997.
[22] Fisher R.A., The use of multiple measurements in taxonomic problems.
Annals of Eugenics, 1936. 7: p. 179 -188.
[23] Erik Hjelmas and P.W. Munro, A comment on parity problem. Technical
Report, 1999: p. 1-7.
[24] Mangasarian O. L. and W.W. H., Cancer diagnosis via linear
programming. SIAM News, 1990. 23(5): p. 1-18.