A Comparison of First and Second Order Training Algorithms for Artificial Neural Networks

Minimization methods for training feed-forward networks with Backpropagation are compared. Feedforward network training is a special case of functional minimization, where no explicit model of the data is assumed. Therefore due to the high dimensionality of the data, linearization of the training problem through use of orthogonal basis functions is not desirable. The focus is functional minimization on any basis. A number of methods based on local gradient and Hessian matrices are discussed. Modifications of many methods of first and second order training methods are considered. Using share rates data, experimentally it is proved that Conjugate gradient and Quasi Newton?s methods outperformed the Gradient Descent methods. In case of the Levenberg-Marquardt algorithm is of special interest in financial forecasting.




References:
[1] Aqil Burney S.M., Jilani A. Tahseen and Cemal Aril, 2004. Levenberg-Marquardt algorithm for Karachi Stock Exchange share rates forecasting. International Journal of Computational Intelligence, pp 168-173. [2] Aqil Burney S.M., Jilani A. Tahseen, 2002, Time Series forecasting using artificial neural network methods for Karachi Stock Exchange. A Project at Department of Computer Science, University of Karachi. [3] Aqil Burney S.M. Measures of central tendency in News-MCB stock exchange. Price index pp 12-31. Developed by Vital information services (pvt) limited, Karachi. C/o Department of Computer Science, University of Karachi. [4] B. Widrow and M. A. Lehr, 1990. 30 years of adaptive neural networks: Perceptron, Madaline, and Backpropagation. IEEE Proceedings-78, pp. 1415-1442. [5] Barak A. Pearlmutter, 1993. Fast Exact Multiplication by the Hessian. Siemens Corporate Research, Neural Computation. [6] Battiti, 1992. First and second-order methods for learning: between steepest descent and Newton's method. Neural Computation 4, pp. 141-166. [7] Blum, Adam, 1992. Neural Network in C++. John Wiley and Sons, New York. [8] Bortolettis, A., Di Fiore, C., Aselli, S. and Bellini, P. A new class of quasi-Newtonian methods for optimal learning in MLP-networks, IEEE Transactions on Neural Networks, vol: 14, Issue: 2. [9] Duda, R. O. Hart, P. E. and Stork, D. G. 2001. Pattern Classification, John Wiley & Sons. [10] C. M. Bishop, 1995. Neural networks for pattern recognition. Clarendon Press. [11] Chauvin, Y., & Rumelhart, D. 1995. Backpropagation: theory, architecture, and applications. Lawrence Erlbaum Association. [12] Chung-Ming Kuan and Tung Liu. 1995. Forecasting exchange rates using feedforward and recurrent neural networks. Journal of Applied Econometrics, pp. 347-64. [13] De Villiers and E. Barnard 1993, Backpropagation neural nets with one and two hidden layers, IEEE Transaction on Neural Networks 4, pp. 136-141 [14] E. Rumelhart, G. E. Hinton, and R. J. Williams, 1986. Learning internal representations by error propagation, Parallel distributed processing. Vol. I, MIT Press, Cambridge, MA, pp. 318-362. [15] Ergezinger and E. Thomsen. 1995. An accelerated learning algorithm for multilayer perceptrons: optimization layer by layer. IEEE Transaction on Neural Networks 6, pp. 31-42. [16] F. Molar, 1997. Efficient Training of feedforward Neural Networks. Ph.D thesis, Computer Science Department, Aarhus University [17] F. Mollar, 1993. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Neural Networks, 6(4): pp. 525-533. [18] Gill, P. E., Murray, W. and Wright, M. H. 1993. Practical Optimization'', Academic Press. [19] Iebeling Kaastra and Milton S.Boyd, 1995. Forecasting future trading volume using neural networks. Journal of Future Markets, pp. 953-70. [20] J. Shepherd, 1997. Second-order optimization methods. Second-order methods for neural networks. Springer-Verlag, Berlin, New York, pp. 43-72. [21] Jacobs, 1988. Increased rate of convergence through learning rate adaptation. Neural Networks 1, pp.295-307. [22] Hornik, M. Stinchcombe, and H. White, 1989. Multi-layer feedforward networks are universal approximators. Neural Networks 2, pp. 359-366. [23] Karayiannis and A. N. Venetsanopoulos, 1993. Artificial neural networks: learning algorithms, performance evaluation, and applications. Kluver Academic, Boston, MA. [24] M. R. Hestenes and E. Stifle, 1952. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards-49, pp. 409?436. [25] Michael Husken, Jens E. Gaykoand Bernard Sendoff, 2000. Optimization for Problem Classes-Neural Networks that Learn to Learn. IEEE Symposium of Evolutionary computation and Neural networks (ECNN-2000), pages 98-109, IEEE Press 2000. [26] Mir F. Atiya, Suzan M. El-Shoura, Samir I. Shaken, 1999. A comparison between neural network forecasting techniques- case study: river flow forecasting". IEEE Transactions on Neural Networks. Vol. 10, No. 2. [27] N. N. Schraudolph, Thore Grapple, 2003. Combining Conjugate Direction Methods with Stochastic Approximation of Gradients. Proceedings of the Ninth International Workshop on Artificial Intelligence and Intelligence. [28] N. N. Schraudolph, 2002. Fast curvature matrix-vector product for second-order gradient descent. Neural Computation, 14(7), pp. 1723-1728. [29] N. N. Schraudolph, 1999. Local gain adaptation in stochastic-gradient descent. In Proceedings of the Ninth International Conference on Artificial Neural Networks, pp. 569-574, Edinburgh, Scotland, 1999. IEE, London. [30] Neural network toolbox, 2002. : For use with Matlab, MathWorks, Natick, MA. [31] Patrick van der Smagt, 1994. Minimization methods for training feedforward neural networks. Neural Networks 7(1), pp. 1-11. [32] Pearlmutter, 1994. Fast exact multiplication by the Hessian. Neural Computation, 6(1), pp.147-160. [33] Puha, P. K. H. Daohua Ming, 2003. Parallel nonlinear optimization techniques for training neural networks. IEEE Transactions on Neural Networks, vol: 14, Issue: 6, pages, pp. 1460-1468. [34] Scarselli and A. C. Tso, 1998. Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results. Neural Networks, pp. 1537. [35] Ripley, B. D. 1994. Neural networks and related methods for classification. Journal of the Royal Statistician Society, B 56(3),409-456. [36] S. Haykin, Neural networks, 1994. A comprehensive foundation, IEEE Press, Piscataway, NJ. [37] S.D. Hunt, J. R. Deller, 1995. Selective training of feedforward artificialneural networks using matrix perturbation theory. Neural networks, 8(6), pp 931-944. [38] Welstead, Stephen T. 1994. Neural network and fuzzy logic applications in C/C++. John Wiley and Sons, Inc. N.Y. [39] Z. Strako?s and P. Tich' y., 2002. On error estimation in the conjugate gradient method and why it works in finite precision computations. ETNA 13, 56-80. [40] Zhou and J. Si, 1998. Advanced neural-network training algorithm with reduced complexity based on Jacobian deficiency. IEEE Trans Neural Networks 9, pp. 448-453.