The Application of an Ensemble of Boosted Elman Networks to Time Series Prediction: A Benchmark Study

In this paper, the application of multiple Elman neural networks to time series data regression problems is studied. An ensemble of Elman networks is formed by boosting to enhance the performance of the individual networks. A modified version of the AdaBoost algorithm is employed to integrate the predictions from multiple networks. Two benchmark time series data sets, i.e., the Sunspot and Box-Jenkins gas furnace problems, are used to assess the effectiveness of the proposed system. The simulation results reveal that an ensemble of boosted Elman networks can achieve a higher degree of generalization as well as performance than that of the individual networks. The results are compared with those from other learning systems, and implications of the performance are discussed.





References:
[1] R.N. Miller and L.L. Ehret, "Ensemble generation for models of
multimodal systems", Monthly Weather Review, vol. 130, pp. 2313-
2333, 2002.
[2] J. G. Carney and P. Cunningham, "The NeuralBAG algorithm:
Optimizing generalization performance in bagged neural networks,"
Proc. 7th European Symp. Artificial Neural Networks, M. Verleysen, Ed.
D-Facto, Brussels, 1999 pp. 35-40.
[3] J. L. Elman, "Finding structure in time," Cognitive Science, vol. 14, pp.
179-211, 1990.
[4] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of
on-line learning and an application to boosting," Journal of Computer
and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[5] T. Masters, Practical Neural Network Recipes in C++. San Diego, CA:
Academic Press, Inc., 1993.
[6] L. K. Hansen and P. Salamon, "Neural network ensembles," IEEE
Trans. Pattern Anal. Machine Intell., vol. 12, no. 10, pp. 993-1001,
1990.
[7] Y. Liu and X. Yao, "Negatively correlated neural networks can produce
best ensembles," Australian Journal of Intelligent Information
Processing Systems, vol. 4, no. 3/4, pp. 176-185, 1997.
[8] F. Fessant, S. Bengio, and D. Collobert, "On the prediction of solar
activity using different neural network models," Annales Geophysicae,
vol. 14, pp. 20-26, 1995.
[9] H. Schwenk and Y. Bengio, "Boosting Neural Networks," Neural
Computation, vol. 12, no. 8, pp. 1869-1887, 2000.
[10] R. E. Schapire and Y. Singer, "Improved boosting algorithms using
confidence-rated predictions," Machine Learning, vol. 37, no. 3, pp.
297-336, 1999.
[11] J.R. Quinlan, "Bagging, Boosting, and C4.5" Proc of the Thirteenth
National Conference on Artificial Intelligence and the Eighth Innovative
Applications of Artificial Intelligence Conference, pp. 725-730, 1996.
[12] T. G. Dietterich, "An experimental comparison of three methods for
constructing ensembles of decision trees: bagging, boosting, and
randomization," Machine Learning, vol. 40, no. 2, pp. 139-158, 2000.
[13] R. S. Zemel and T. Pitassi, "A gradient-based boosting algorithm for
regression problems," Advances in Neural Information Processing
Systems 13, T. Leen, T. Dietterich, and V. Tresp eds., the MIT Press,
2001, pp. 696-702.
[14] H. Drucker, "Fast committee machines for regression and classification,"
Third Int. Conf. Knowledge Discovery and Data Mining (KDD-97), D.
Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy eds., Menlo
Park, CA: AAADietterichI Press, 1997, pp. 159-162.
[15] H. Drucker, "Improving regressors using boosting techniques," Proc.
Fourteenth Int. Conf. Machine Learning (ICML-97), D. H. Fisher, Ed.
Morgan Kaufmann, 1997, pp. 107-115.
[16] S. Borra and A. Di Ciaccio, "Improving nonparametric regression
methods by bagging and boosting," Computational Statistics & Data
Analysis, vol. 38, pp. 407-420, 2002.
[17] G. Giacinto and F. Roli, "An approach to the automatic design of
multiple classifier systems," Pattern Recognition Letters, vol. 22, pp. 25-
33, 2001.
[18] P. Stagge and B. Sendhoff, "An extended Elman net for modeling time
series," Int. Conf. Artificial Neural Networks (ICANN'97), W. Gerstner,
A. Germond, M. Hasler, and J. Nicoud, eds., Springer Verlag, 1997, vol.
1327 of Lecture Notes in Computer Science, pp. 427-432.
[19] W.Y. Goh, C.P. Lim, and K.K. Peh, "Predicting Drug Dissolution
Profiles with an Ensemble of Boosted Neural Networks: A Time-series
Approach", IEEE Trans. on Neural Networks, vol. 14, pp. 459-463,
2003. Y. Freund and R. E. Schapire, "A short introduction to boosting,"
Journal of Japanese Society for Artificial Intelligence, vol. 14, no. 5, pp.
771-780, (Appearing in Japanese, translation by Naoki Abe.), 1999.
[20] D. Nguyen and B. Widrow, "Improving the learning speed of 2-layer
neural networks by choosing initial values of the adaptive weights,"
Proc. Int. Joint Conf. Neural Networks, vol. 3, pp. 21-26, 1990.
[21] D. E. Rumelhart, G. E. Hinton and R. J. Williams, "Learning internal
representations by error propagation". Parallel Distributed Processing:
Explorations in the microstructure of cognition, D. E. Rumelhart and J.
L. McClelland, Eds. Cambridge, MA: MIT Press, 1986 vol. 1, pp. 318-
362.
[22] M. N├©rgaard, O. Ravn, N. K. Poulsen, and L. K. Hansen. (2000).
Neural Networks for Modelling and Control of Dynamic Systems.
London: Springer-Verlag. (Online). Available:
http://www.iau.dtu.dk/nnspringer.html
[23] A. A. M. Khalaf and K. Nakayama, "A cascade form predictor of neural
and FIR filters and its minimum size estimation based on nonlinearity
analysis of time series," IEICE Trans. Fundamentals, vol. E81-A, no. 3,
pp. 364-373, 1998.
[24] J. A. Leonard, M. A. Kramer, and L. H. Ungar, "A neural network
architecture that computes its own reliability," Computers & Chemical
Engineering, vol. 16, no. 9, pp. 819-835, 1992.
[25] G. E. P. Box and G. M. Jenkins, Time Series Analysis, Forecasting and
Control. San Francisco: Holden-Day, 1970.
[26] J. Kim and N. Kasabov, "HyFIS: Adaptive neuro-fuzzy inference
systems and their application to nonlinear dynamical systems," Neural
Networks, vol. 12, pp. 1301-1319, 1999.
[27] N. Kasabov, J. Kim, M. Watts, and A. Gray, "FuNN/2 ÔÇö A fuzzy neural
network architecture for adaptive learning and knowledge acquisition,"
Information Sciences, vol. 101, no.3-4, pp. 155-175, 1997.
[28] J.-S. R. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft
Computing: A Computational Approach to Learning and Machine
Intelligence. Upper Saddle River, NJ: Prentice-Hall, 1997.
[29] W. Hauptmann and K. Heesche, "A neural net topology for bidirectional
fuzzy-neuro transformation," Proc. IEEE Int. Conf. Fuzzy Systems
(FUZZ-IEEE/IFES), Yokohama, Japan, 1995, pp. 1511-1518.
[30] H. Surmann, A. Kanstein, and K. Goser, "Self-organizing and genetic
algorithms for an automatic design of fuzzy control and decision
systems," Proc. First European Congress on Fuzzy and Intelligent
Technologies (EUFIT-93), Aachen, 1993, vol. 1, pp. 1097-1104.
[31] W. Pedrycz, "An identification algorithm in fuzzy relational systems,"
Fuzzy Sets and Systems, vol. 13, pp. 153-167, 1984.
[32] C.-W. Xu and Y.-Z. Lu, "Fuzzy model identification and self-learning
for dynamic systems," IEEE Trans. Syst., Man, Cybern., vol. 17 no. 4,
pp. 683-689, 1987.
[33] M. Sugeno and T. Yasukawa, "Linguistic modelling based on numerical
data," Proc. Fourth Int. Fuzzy Systems Association World Congress
(IFSA-91), R. Lowen and M. Roubens, Eds. Br├╝ssels, Belgium:
Computer, Management & Systems Science, 1991, pp. 264-267.
[34] W. Pedrycz, P. C. F. Lam, and A. F. Rocha, "Distributed fuzzy system
modelling," IEEE Trans. Syst., Man, Cybern., vol. 25, no. 5, pp. 769-
780, 1995.
[35] Y.-C. Lee, C. Hwang, and Y.-P. Shih, "A combined approach to fuzzy
model identification," IEEE Trans. Syst., Man, Cybern., vol. 24, no. 5,
pp. 736-744, 1994.
[36] R. M. Tong, "The evaluation of fuzzy models derived from experimental
data," Fuzzy Sets and Systems, vol. 4, pp. 1-12, 1980.
[37] R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, "Boosting the
margin: A new explanation for the effectiveness of voting methods,"
Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998.