A Comparative Study of Additive and Nonparametric Regression Estimators and Variable Selection Procedures

One of the biggest challenges in nonparametric
regression is the curse of dimensionality. Additive models are known
to overcome this problem by estimating only the individual additive
effects of each covariate. However, if the model is misspecified, the
accuracy of the estimator compared to the fully nonparametric one
is unknown. In this work the efficiency of completely nonparametric
regression estimators such as the Loess is compared to the estimators
that assume additivity in several situations, including additive and
non-additive regression scenarios. The comparison is done by
computing the oracle mean square error of the estimators with regards
to the true nonparametric regression function. Then, a backward
elimination selection procedure based on the Akaike Information
Criteria is proposed, which is computed from either the additive or
the nonparametric model. Simulations show that if the additive model
is misspecified, the percentage of time it fails to select important
variables can be higher than that of the fully nonparametric approach.
A dimension reduction step is included when nonparametric estimator
cannot be computed due to the curse of dimensionality. Finally, the
Boston housing dataset is analyzed using the proposed backward
elimination procedure and the selected variables are identified.




References:
[1] R. L. Eubank, Nonparametric Regression and Spline Smoothing,
Statistics: A Series of Textbooks and Monographs, 1999.
[2] P. J. Green and B.W. Silverman, Nonparametric Regression and
Generalized Linear Models: A roughness penalty approach, Chapman
& Hall, 1994.
[3] S. Efromovich, Nonparametric Curve Estimation: Methods, Theory, and
Applications, Springer Series in Statistics, 1999. [4] D. Ruppert, M. P. Wand, U. Holst and O. Hssjer, Local Polynomial
Variance-Function Estimation, Technometrics, 39, pp. 262-273, 1997.
[5] R. T. Rust, Flexible Regression, Journal of Marketing Research, 25, pp.
10-24, 1988.
[6] S. Durrleman and R. Simon, Flexible regression models with cubic
splines, 8, pp. 551-561, 1989.
[7] C. J. Stone, Additive Regression and Other Nonparametric Models,The
Annals of Statistics, 13, pp. 689-705, 1985.
[8] D. L. Donoho, High-dimensional data analysis: The curses and blessings
of dimensionality, AMS Conference on Math and Challenges of the 21st
Century.
[9] W. Hardle and E. Mammen, Comparing Nonparametric Versus
Parametric Regression Fits, The Annals of Statistics, 21, 1926-1947,
1993.
[10] N. R. Draper and H. Smith, Applied Regression Analysis, 3rd Edition,
Wiley.
[11] G. A. Davis and N. L. Nihan, Nonparametric Regression and Short-Term
Freeway Traffic Forecasting, Journal of Transportation Engineering, 117,
1991.
[12] J. G. Staniswalis and J.J. Lee, Nonparametric Regression Analysis of
Longitudinal Data, Journal of the American Statistical Association, 93,
pp. 1403-1418, 1998.
[13] P. Constans and J.D. Hirst, Nonparametric Regression Applied to
Quantitative Structure Activity Relationships, Journal of Chemical
Information and Modeling, 40, pp 452-459, 2000.
[14] J. Qiu, H. Wang, D. Lin and B. He, Nonparametric regression-based
failure rate model for electric power equipment using lifecycle data, Transmission and Distribution Conference and Exposition (T&D), 2016
IEEE/PES.
[15] E. A. Nadaraya, On Estimating Regression, Theory of Probability and
its Applications, 9, pp. 141-142, 1964.
[16] G. S. Watson, Smooth regression analysis, Sankhya: The Indian Journal
of Statistics, Series A, 26, 359-372, 1964.
[17] W. S. Cleveland, Robust Locally Weighted Regression and Smoothing
Scatterplots, Journal of the American Statistical Association, 74, 829-836,
1979.
[18] W. S. Cleveland, LOWESS: A program for smoothing scatterplots by
robust locally weighted regression, The American Statistician, 35, 1981.
[19] M. P. Wand and M.C Jones, Kernel Smoothing, Chapman & Hall, 1995.
[20] Fan, J. and Gijbels, I, Local Polynomial Modelling and its Applications,
Boca Raton: Chapman and Hall, 1996.
[21] E. Masry, Multivariate Local Polynomial Regression for Time Series:
Uniform Strong Consistency and Rates, Journal of Time Series Analysis,
17, pp. 571-599, 1996.
[22] D. Ruppert and M.P.. Wand, Multivariate Locally Weighted Least
Squares Regression, The Annals of Statistics, 22, pp. 1346-1370, 1994.
[23] J. H. Friedman and W. Stuetzle, Projection Pursuit Regression, Journal
of the American Statistical Association, 76, 817-823, 1981.
[24] T. J. Hastie and R.J. Tibshirani, Generalized Additive Models, Chapman
& Hall, 1990.
[25] A. Buja, T. Hastie and R.Tibshirani, Linear Smoothers and Additive
Models, The Annals of Statistics, 17, 453-555, 1989.
[26] J.D. Opsomer, Asymptotic Properties of Backfitting Estimators, Journal
of Multivariate Analysis, 73, 166-179, 2000.
[27] C. M. Hurvich, J. S. Simonoff and C.-L. Tsai, Smoothing Parameter
Selection in Nonparametric Regression Using an Improved Akaike
Information Criterion, Journal of the Royal Statistical Society. Series B,
60, pp. 271-293, 1998.
[28] Boston Housing Dataset, available at
https://archive.ics.uci.edu/ml/datasets/Housing.