The approach of subset selection in polynomial
regression model building assumes that the chosen fixed full set of
predefined basis functions contains a subset that is sufficient to
describe the target relation sufficiently well. However, in most cases
the necessary set of basis functions is not known and needs to be
guessed – a potentially non-trivial (and long) trial and error process.
In our research we consider a potentially more efficient approach –
Adaptive Basis Function Construction (ABFC). It lets the model
building method itself construct the basis functions necessary for
creating a model of arbitrary complexity with adequate predictive
performance. However, there are two issues that to some extent
plague the methods of both the subset selection and the ABFC,
especially when working with relatively small data samples: the
selection bias and the selection instability. We try to correct these
issues by model post-evaluation using Cross-Validation and model
ensembling. To evaluate the proposed method, we empirically
compare it to ABFC methods without ensembling, to a widely used
method of subset selection, as well as to some other well-known
regression modeling methods, using publicly available data sets.
[1] J. O. Rawlings, Applied Regression Analysis: A Research Tool, 2nd ed.
CA: Wadsworth & Brooks/Cole, 1998.
[2] D. W. Aha and R. L. Bankert, "A comparative evaluation of sequential
feature selection algorithms," Learning from Data, D. Fisher, H. J. Lenz,
Eds., New York: Springer, 1996, pp. 199-206.
[3] L. Todorovski, P. Ljubic, and S. Dzeroski, "Inducing polynomial
equations for regression," Lecture notes in computer science, Lecture
notes in artificial intelligence, 3201, Berlin: Springer, pp. 441-452,
2004.
[4] G. Jekabsons and J. Lavendels, "Polynomial regression modelling using
adaptive construction of basis functions", IADIS International
Conference, Applied Computing 2008, Algarve, Portugal, 2008, to be
published
[5] P. Pudil, J. Novovicova, and J. Kittler, "Floating search methods in
feature selection," Pattern Recognition Letters, vol. 15, pp. 1119-1125,
1994.
[6] K. P. Burnham and D. R. Anderson, Model Selection and Multimodel
Inference: A Practical Information-Theoretic Approach. Springer, 2002.
[7] R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy
Estimation and Model Selection," Proceedings of the Fourteenth
International Joint Conference on Artificial Intelligence, San Mateo,
CA, pp. 1137-1145, 1995.
[8] D. Opitz and R. Maclin, ÔÇ×Popular Ensemble Methods: An Empirical
Study," Journal of Artificial Intelligence Research, vol. 11, pp. 169-198,
1999.
[9] M. L. Ginsberg, Essentials of Artificial Intelligence. Morgan Kaufmann,
1993.
[10] A. L. Blum and P. Langley, "Selection of relevant features and examples
in machine learning," Artificial Intelligence, vol. 97, pp. 245-271, 1997.
[11] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach,
2nd edition. Prentice Hall, Englewood Cliffs, New Jersey 07632, 2002.
[12] H. Akaike, "A new look at the statistical model identification," IEEE
Transactions on Automatic Control, vol. 19, pp. 716-723, 1974.
[13] C. M. Hurvich and C.-L. Tsai, "Regression and time series model
selection in small samples," Biometrika, vol. 76, pp. 297-307, 1989.
[14] S. D. Stearns, "On selecting features for pattern classifiers," Proceedings
of the 3rd International Joint Conference on Pattern Recognition, IEEE,
pp. 71-75., 1976.
[15] J. F. Elder IV, "The Generalization Paradox of Ensembles," Journal of
Computational and Graphical Statistics, vol. 12, pp. 853-864, 2003.
[16] J. Reunanen, "Overfitting in making comparisons between variable
selection methods," Journal of Machine Learning Research, vol. 3, pp.
371-382, 2003.
[17] J. Loughrey and P. Cunningham, "Overfitting in Wrapper-Based Feature
Subset Selection: The Harder You Try the Worse it Gets," 24rth SGAI
International Conference on Innovative Techniques and Applications of
Artificial Intelligence (AI-2004), pp. 33-43, 2004.
[18] D. M. Allen, "The prediction sum of squares as a criterion for selection
of predictor variables," Tech. Rep. 23, Department of Statistics,
University of Kentucky, 1971.
[19] R. Kohavi and G. H. John, "Wrappers for Feature Subset Selection,"
Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[20] F. E. Harrell Jr., Regression Modelling Strategies with Applications to
Linear Models, Logistic Regression, and Survival Analysis. New York:
Springer, 2001.
[21] L. Breiman. "Heuristics of instability and stabilization in model
selection," Annals of Statistics, vol. 24, pp. 2350-2383, 1996.
[22] S. Kotsiantis and P. Pintelas, "Combining Bagging and Boosting,"
International Journal of Computational Intelligence, vol. 1, pp. 324-
333., 2004.
[23] I. H. Witten and E. Frank, Data Mining: Practical machine learning
tools and techniques, 2nd ed., SF: Morgan Kaufmann, 2005.
[24] J. Rissanen, "Modeling by shortest data description," Automatica, vol.
14, pp. 465-471, 1978.
[1] J. O. Rawlings, Applied Regression Analysis: A Research Tool, 2nd ed.
CA: Wadsworth & Brooks/Cole, 1998.
[2] D. W. Aha and R. L. Bankert, "A comparative evaluation of sequential
feature selection algorithms," Learning from Data, D. Fisher, H. J. Lenz,
Eds., New York: Springer, 1996, pp. 199-206.
[3] L. Todorovski, P. Ljubic, and S. Dzeroski, "Inducing polynomial
equations for regression," Lecture notes in computer science, Lecture
notes in artificial intelligence, 3201, Berlin: Springer, pp. 441-452,
2004.
[4] G. Jekabsons and J. Lavendels, "Polynomial regression modelling using
adaptive construction of basis functions", IADIS International
Conference, Applied Computing 2008, Algarve, Portugal, 2008, to be
published
[5] P. Pudil, J. Novovicova, and J. Kittler, "Floating search methods in
feature selection," Pattern Recognition Letters, vol. 15, pp. 1119-1125,
1994.
[6] K. P. Burnham and D. R. Anderson, Model Selection and Multimodel
Inference: A Practical Information-Theoretic Approach. Springer, 2002.
[7] R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy
Estimation and Model Selection," Proceedings of the Fourteenth
International Joint Conference on Artificial Intelligence, San Mateo,
CA, pp. 1137-1145, 1995.
[8] D. Opitz and R. Maclin, ÔÇ×Popular Ensemble Methods: An Empirical
Study," Journal of Artificial Intelligence Research, vol. 11, pp. 169-198,
1999.
[9] M. L. Ginsberg, Essentials of Artificial Intelligence. Morgan Kaufmann,
1993.
[10] A. L. Blum and P. Langley, "Selection of relevant features and examples
in machine learning," Artificial Intelligence, vol. 97, pp. 245-271, 1997.
[11] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach,
2nd edition. Prentice Hall, Englewood Cliffs, New Jersey 07632, 2002.
[12] H. Akaike, "A new look at the statistical model identification," IEEE
Transactions on Automatic Control, vol. 19, pp. 716-723, 1974.
[13] C. M. Hurvich and C.-L. Tsai, "Regression and time series model
selection in small samples," Biometrika, vol. 76, pp. 297-307, 1989.
[14] S. D. Stearns, "On selecting features for pattern classifiers," Proceedings
of the 3rd International Joint Conference on Pattern Recognition, IEEE,
pp. 71-75., 1976.
[15] J. F. Elder IV, "The Generalization Paradox of Ensembles," Journal of
Computational and Graphical Statistics, vol. 12, pp. 853-864, 2003.
[16] J. Reunanen, "Overfitting in making comparisons between variable
selection methods," Journal of Machine Learning Research, vol. 3, pp.
371-382, 2003.
[17] J. Loughrey and P. Cunningham, "Overfitting in Wrapper-Based Feature
Subset Selection: The Harder You Try the Worse it Gets," 24rth SGAI
International Conference on Innovative Techniques and Applications of
Artificial Intelligence (AI-2004), pp. 33-43, 2004.
[18] D. M. Allen, "The prediction sum of squares as a criterion for selection
of predictor variables," Tech. Rep. 23, Department of Statistics,
University of Kentucky, 1971.
[19] R. Kohavi and G. H. John, "Wrappers for Feature Subset Selection,"
Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[20] F. E. Harrell Jr., Regression Modelling Strategies with Applications to
Linear Models, Logistic Regression, and Survival Analysis. New York:
Springer, 2001.
[21] L. Breiman. "Heuristics of instability and stabilization in model
selection," Annals of Statistics, vol. 24, pp. 2350-2383, 1996.
[22] S. Kotsiantis and P. Pintelas, "Combining Bagging and Boosting,"
International Journal of Computational Intelligence, vol. 1, pp. 324-
333., 2004.
[23] I. H. Witten and E. Frank, Data Mining: Practical machine learning
tools and techniques, 2nd ed., SF: Morgan Kaufmann, 2005.
[24] J. Rissanen, "Modeling by shortest data description," Automatica, vol.
14, pp. 465-471, 1978.
@article{"International Journal of Engineering, Mathematical and Physical Sciences:49735", author = "Gints Jekabsons", title = "Ensembling Adaptively Constructed Polynomial Regression Models", abstract = "The approach of subset selection in polynomial
regression model building assumes that the chosen fixed full set of
predefined basis functions contains a subset that is sufficient to
describe the target relation sufficiently well. However, in most cases
the necessary set of basis functions is not known and needs to be
guessed – a potentially non-trivial (and long) trial and error process.
In our research we consider a potentially more efficient approach –
Adaptive Basis Function Construction (ABFC). It lets the model
building method itself construct the basis functions necessary for
creating a model of arbitrary complexity with adequate predictive
performance. However, there are two issues that to some extent
plague the methods of both the subset selection and the ABFC,
especially when working with relatively small data samples: the
selection bias and the selection instability. We try to correct these
issues by model post-evaluation using Cross-Validation and model
ensembling. To evaluate the proposed method, we empirically
compare it to ABFC methods without ensembling, to a widely used
method of subset selection, as well as to some other well-known
regression modeling methods, using publicly available data sets.", keywords = "Basis function construction, heuristic search, modelensembles, polynomial regression.", volume = "2", number = "2", pages = "60-6", }