PM10 Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria

Ambient air pollution with fine particulate matter (PM10) is a systematic permanent problem in many countries around the world. The accumulation of a large number of measurements of both the PM10 concentrations and the accompanying atmospheric factors allow for their statistical modeling to detect dependencies and forecast future pollution. This study applies the classification and regression trees (CART) method for building and analyzing PM10 models. In the empirical study, average daily air data for the city of Pleven, Bulgaria for a period of 5 years are used. Predictors in the models are seven meteorological variables, time variables, as well as lagged PM10 variables and some lagged meteorological variables, delayed by 1 or 2 days with respect to the initial time series, respectively. The degree of influence of the predictors in the models is determined. The selected best CART models are used to forecast future PM10 concentrations for two days ahead after the last date in the modeling procedure and show very accurate results.




References:
[1] Health Effects of Particulate Matter. Policy Implications for Countries in Eastern Europe, Caucasus and Central Asia. World Health Organization, 2013. <www.euro.who.int/__data/assets/pdf_file/0006/189051/Health-effects-of-particulate-matter-final-Eng.pdf>
[2] A. Seaton, D. Godden, W. MacNee, and K. Donaldson, “Particulate air pollution and acute health effects,” The Lancet, vol. 345, no. 8943, pp. 176-178, 1995.
[3] Air quality in Europe - 2014 report. European Environment Agency, Publications, 19 Nov 2014. <http://www.eea.europa.eu/publications/air-quality-in-europe-2014/at_download/file>
[4] Executive Environment Agency, Bulgaria. <http://eea.government.bg/en>
[5] Air Quality Standards: Environment. European Commission. <http://ec.europa.eu/environment/air/quality/standards.htm>
[6] “Directive 2008/50/EC of the European Parliament and of the council of 21 May 2008 on ambient air quality and cleaner air for Europe,” Official Journal of the European Union, L 152/1, 2008.
[7] S. Abdullah, M. Ismail, and S. Y. Fong, “Multiple linear regression (MLR) models for long term PM10 concentration forecasting during different monsoon seasons,” Journal of Sustainability Science and Management, vol. 12, no. 1, pp. 60-69, 2017.
[8] A. Vlachogianni, P. Kassomenos, A. Karppinen, S. Karakitsios, and J. Kukkonen, “Evaluation of a multiple regression model for the forecasting of the concentrations of NOx and PM10 in Athens and Helsinki,” Science of The Total Environment, vol. 409, no. 8, pp. 1559-1571, 2011.
[9] K. Y. Ng and N. Awang, “Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia,” Environ Monit Assess, vol. 190, no. 63, pp. 1-11, 2018. https://doi.org/10.1007/s10661-017-6419-z
[10] I. Zheleva, E. Veleva, and M. Filipova, “Analysis and modeling of daily air pollutants in the city of Ruse, Bulgaria,” in AIP Conference Proceedings, vol. 1895, 030007, 2017.
[11] L. Jian, Y. Zhao, Y. P. Zhu, M. B. Zhang, and D. Bertolatti, “An application of ARIMA model to predict submicron particle concentrations from meteorological factors at a busy roadside in Hangzhou, China,” Science of The Total Environment, vol. 426, pp. 336-345, 2012.
[12] P. W. G. Liu, “Simulation of the daily average PM10 concentrations at Ta-Liao with Box-Jenkins time series models and multivariate analysis,” Atmospheric Environment, vol. 43. pp. 2104-2113, 2009.
[13] M. Zickus, A. J. Greig, and M. Niranjan, “Comparison of four machine learning methods for predicting PM10 concentrations in Helsinki, Finland,” Water, Air, & Soil Pollution: Focus, vol. 2, pp. 717-729, 2002.
[14] S. G. Gocheva-Ilieva, A. V. Ivanov, D. S. Voynikova, and D. T. Boyadzhiev, “Time series analysis and forecasting for air pollution in small urban area: an SARIMA and factor analysis approach”, Stochastic Environ Res Risk Assess, vol. 28, no. 4, 1045-1060, 2014.
[15] D. S. Wilks, Statistical Methods in the Atmospheric Sciences, 3nd ed. Amsterdam: Elsevier, 2011.
[16] F. Biancofiore, M. Busilacchio, M. Verdecchia, B. Tomassetti, E. Aruffo, S. Bianco, S. Di Tommaso, C. Colangeli, G. Rosatelli, and P. Di Carlo, “Recursive neural network model for analysis and forecast of PM10 and PM2.5,” Atmos Poll Research, vol. 8, no. 4, pp. 652-659, 2017.
[17] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Amsterdam: Morgan Kaufmann, Elsevier, 2016.
[18] S. S. Ganesh, P. Arulmozhivarman, and R. Tatavarti, “Forecasting air quality index using an ensemble of artificial neural networks and regression models,” Journal of Intelligent Systems, 2017. https://doi.org/10.1515/jisys-2017-0277
[19] T. Slini, A. Kaprara, K. Karatzas, and N. Moussiopoulos, “PM10 forecasting for Thessaloniki, Greece,” Environ Modell Softw, vol. 24, no. 1, pp. 559-565, 2006.
[20] W. Choi, S. E. Paulson, J. Casmassi, and A. M. Winer, “Evaluating meteorological comparability in air quality studies: Classification and regression trees for primary pollutants in California's South Coast Air Basin,” Atmospheric Environment, vol. 64, pp. 150-159, 2013.
[21] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Belmont: Wadsworth, 1984.
[22] A. J. Izenman, Modern Multivariate Statistical Techniques Regression, Classification, and Manifold Learning, New York: Springer, 2008.