A Comparative Study of Additive and Nonparametric Regression Estimators and Variable Selection Procedures

One of the biggest challenges in nonparametric regression is the curse of dimensionality. Additive models are known to overcome this problem by estimating only the individual additive effects of each covariate. However, if the model is misspecified, the accuracy of the estimator compared to the fully nonparametric one is unknown. In this work the efficiency of completely nonparametric regression estimators such as the Loess is compared to the estimators that assume additivity in several situations, including additive and non-additive regression scenarios. The comparison is done by computing the oracle mean square error of the estimators with regards to the true nonparametric regression function. Then, a backward elimination selection procedure based on the Akaike Information Criteria is proposed, which is computed from either the additive or the nonparametric model. Simulations show that if the additive model is misspecified, the percentage of time it fails to select important variables can be higher than that of the fully nonparametric approach. A dimension reduction step is included when nonparametric estimator cannot be computed due to the curse of dimensionality. Finally, the Boston housing dataset is analyzed using the proposed backward elimination procedure and the selected variables are identified.

Input Variable Selection for RBFN-based Electric Utility's CO2 Emissions Forecasting

This study investigates the performance of radial basis function networks (RBFN) in forecasting the monthly CO2 emissions of an electric power utility. We also propose a method for input variable selection. This method is based on identifying the general relationships between groups of input candidates and the output. The effect that each input has on the forecasting error is examined by removing all inputs except the variable to be investigated from its group, calculating the networks parameter and performing the forecast. Finally, the new forecasting error is compared with the reference model. Eight input variables were identified as the most relevant, which is significantly less than our reference model with 30 input variables. The simulation results demonstrate that the model with the 8 inputs selected using the method introduced in this study performs as accurate as the reference model, while also being the most parsimonious.