Using Fractional Factorial Designs for Variable Importance in Random Forest Models

Scholarly

Volume:6, Issue: 11, 2012 Page No: 1460 - 1464

International Journal of Engineering, Mathematical and Physical Sciences

ISSN: 2517-9934

1377 Downloads

Abstract Full Text Download References Share Add to Favorites

DOI:10.5281/zenodo.1058377 BibTeX JSON

Using Fractional Factorial Designs for Variable Importance in Random Forest Models

Random Forests are a powerful classification technique, consisting of a collection of decision trees. One useful feature of Random Forests is the ability to determine the importance of each variable in predicting the outcome. This is done by permuting each variable and computing the change in prediction accuracy before and after the permutation. This variable importance calculation is similar to a one-factor-at a time experiment and therefore is inefficient. In this paper, we use a regular fractional factorial design to determine which variables to permute. Based on the results of the trials in the experiment, we calculate the individual importance of the variables, with improved precision over the standard method. The method is illustrated with a study of student attrition at Monash University.

Authors:

Keywords:

References:

[1] Box, G.E.P. and Hunter, J.S. and Hunter, W.G., Statistics for Experi-menters, 2nd ed. Hoboken, New Jersey: John Wiley & Sons, 2005.
[2] Breiman, L. and Cutler, A., "Random Forests", Salford Sytems, www.salfordsystems.com, 2008.
[3] Hastie, T. and R.Tibshirani and J.Friedman, The Elements of Statistical Learning, 2nd. Ed., New York: Springer, 2009.
[4] Liaw, A. and M.Wiener, “Classification and Regression by random Forest”, R News, 2(3), 18-22, 2002.
[5] Liaw, A. and M.Wiener, randomForest: Breiman and Cutler’s random forests for classification and regression, R package version 4.6-12., http:/CRAN.R-project.org/package=randomForest, 2012.
[6] Margolon, B.H., “Results on factorial designs of resolution IV for the 2n and 2n3m series”, Technometrics, 10, 431-444, 1969.
[7] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, http://www.R-project.org, 2012.

Scholarly

International Journal of Engineering, Mathematical and Physical Sciences

Archive

Last Issue

Commitee

Using Fractional Factorial Designs for Variable Importance in Random Forest Models

Scholarly

International Journal of Engineering, Mathematical and Physical Sciences

Archive

Last Issue

Commitee

Using Fractional Factorial Designs for Variable Importance in Random Forest Models

Preview