Ensembling Adaptively Constructed Polynomial Regression Models

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.

Authors:



References:
[1] J. O. Rawlings, Applied Regression Analysis: A Research Tool, 2nd ed.
CA: Wadsworth & Brooks/Cole, 1998.
[2] D. W. Aha and R. L. Bankert, "A comparative evaluation of sequential
feature selection algorithms," Learning from Data, D. Fisher, H. J. Lenz,
Eds., New York: Springer, 1996, pp. 199-206.
[3] L. Todorovski, P. Ljubic, and S. Dzeroski, "Inducing polynomial
equations for regression," Lecture notes in computer science, Lecture
notes in artificial intelligence, 3201, Berlin: Springer, pp. 441-452,
2004.
[4] G. Jekabsons and J. Lavendels, "Polynomial regression modelling using
adaptive construction of basis functions", IADIS International
Conference, Applied Computing 2008, Algarve, Portugal, 2008, to be
published
[5] P. Pudil, J. Novovicova, and J. Kittler, "Floating search methods in
feature selection," Pattern Recognition Letters, vol. 15, pp. 1119-1125,
1994.
[6] K. P. Burnham and D. R. Anderson, Model Selection and Multimodel
Inference: A Practical Information-Theoretic Approach. Springer, 2002.
[7] R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy
Estimation and Model Selection," Proceedings of the Fourteenth
International Joint Conference on Artificial Intelligence, San Mateo,
CA, pp. 1137-1145, 1995.
[8] D. Opitz and R. Maclin, ÔÇ×Popular Ensemble Methods: An Empirical
Study," Journal of Artificial Intelligence Research, vol. 11, pp. 169-198,
1999.
[9] M. L. Ginsberg, Essentials of Artificial Intelligence. Morgan Kaufmann,
1993.
[10] A. L. Blum and P. Langley, "Selection of relevant features and examples
in machine learning," Artificial Intelligence, vol. 97, pp. 245-271, 1997.
[11] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach,
2nd edition. Prentice Hall, Englewood Cliffs, New Jersey 07632, 2002.
[12] H. Akaike, "A new look at the statistical model identification," IEEE
Transactions on Automatic Control, vol. 19, pp. 716-723, 1974.
[13] C. M. Hurvich and C.-L. Tsai, "Regression and time series model
selection in small samples," Biometrika, vol. 76, pp. 297-307, 1989.
[14] S. D. Stearns, "On selecting features for pattern classifiers," Proceedings
of the 3rd International Joint Conference on Pattern Recognition, IEEE,
pp. 71-75., 1976.
[15] J. F. Elder IV, "The Generalization Paradox of Ensembles," Journal of
Computational and Graphical Statistics, vol. 12, pp. 853-864, 2003.
[16] J. Reunanen, "Overfitting in making comparisons between variable
selection methods," Journal of Machine Learning Research, vol. 3, pp.
371-382, 2003.
[17] J. Loughrey and P. Cunningham, "Overfitting in Wrapper-Based Feature
Subset Selection: The Harder You Try the Worse it Gets," 24rth SGAI
International Conference on Innovative Techniques and Applications of
Artificial Intelligence (AI-2004), pp. 33-43, 2004.
[18] D. M. Allen, "The prediction sum of squares as a criterion for selection
of predictor variables," Tech. Rep. 23, Department of Statistics,
University of Kentucky, 1971.
[19] R. Kohavi and G. H. John, "Wrappers for Feature Subset Selection,"
Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[20] F. E. Harrell Jr., Regression Modelling Strategies with Applications to
Linear Models, Logistic Regression, and Survival Analysis. New York:
Springer, 2001.
[21] L. Breiman. "Heuristics of instability and stabilization in model
selection," Annals of Statistics, vol. 24, pp. 2350-2383, 1996.
[22] S. Kotsiantis and P. Pintelas, "Combining Bagging and Boosting,"
International Journal of Computational Intelligence, vol. 1, pp. 324-
333., 2004.
[23] I. H. Witten and E. Frank, Data Mining: Practical machine learning
tools and techniques, 2nd ed., SF: Morgan Kaufmann, 2005.
[24] J. Rissanen, "Modeling by shortest data description," Automatica, vol.
14, pp. 465-471, 1978.