Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second

Pulmonary Function Tests are important non-invasive
diagnostic tests to assess respiratory impairments and provides
quantifiable measures of lung function. Spirometry is the most
frequently used measure of lung function and plays an essential role
in the diagnosis and management of pulmonary diseases. However,
the test requires considerable patient effort and cooperation,
markedly related to the age of patients resulting in incomplete data
sets. This paper presents, a nonlinear model built using Multivariate
adaptive regression splines and Random forest regression model to
predict the missing spirometric features. Random forest based feature
selection is used to enhance both the generalization capability and the
model interpretability. In the present study, flow-volume data are
recorded for N= 198 subjects. The ranked order of feature importance
index calculated by the random forests model shows that the
spirometric features FVC, FEF25, PEF, FEF25-75, FEF50 and the
demographic parameter height are the important descriptors. A
comparison of performance assessment of both models prove that, the
prediction ability of MARS with the `top two ranked features namely
the FVC and FEF25 is higher, yielding a model fit of R2= 0.96 and
R2= 0.99 for normal and abnormal subjects. The Root Mean Square
Error analysis of the RF model and the MARS model also shows that
the latter is capable of predicting the missing values of FEV1 with a
notably lower error value of 0.0191 (normal subjects) and 0.0106
(abnormal subjects) with the aforementioned input features. It is
concluded that combining feature selection with a prediction model
provides a minimum subset of predominant features to train the
model, as well as yielding better prediction performance. This
analysis can assist clinicians with a intelligence support system in the
medical diagnosis and improvement of clinical care.





References:
[1] Daniel C Ginnan and Jonathon Dean Truwit, “Clinical review:
Respiratory mechanics in spontaneous and assisted ventilation,” Critical
Care, vol. 9, no.5, pp. 472–484, 2005.
[2] R. L. Mulder, N. M. Thonissen, J. H. H. Vander Pal, P. Bresser, W.
Hanselaar, C. C. E. Koning, F. Oldenburger, H. A. Heij, H. N. Caron,
“Pulmonary function impairment measured by pulmonary function tests
in lon g-term survivors of childhood cancer,” Thorax, vol. 66, pp. 1065-
1071, 2011.
[3] A. Mythili, C. M. Sujatha , S. Srinivasan and S. Ramakrishnan,
“Prediction Of Forced Expiratory Volume In Spirometric Pulmonary
Function Test Using Adaptive Neuro Fuzzy Inference System,”
Biomedical Sciences Instrumentation, vol. 48, pp.508-15, 2012.
[4] D. Ozerkis-Antin, J. Evans, A. Rubinowitz, R.J. Horner, R.A. Matthay,
“Pulmonary manifestations of rheumatoid arthritis,” Clinical Chest
Medicine, vol.31, no.3, pp. 451-78, 2010.
[5] Thomas A Barnes, Len Fromer, “Spirometry use: detection of chronic
obstructive pulmonary disease in the primary care setting,” International
Journal of CODP, 2011.
[6] R.E. Dales, K.L. Vandemheen, J. Clinch, et al. “Spirometry in the
primary care setting: influence on clinical diagnosis and management of
airflow obstruction,” Journal of Chest, vol.128, no. 4, pp. 2443–2447,
2005.
[7] N. Chavannes, T. Schermer, R. Akkermans, et al. “Impact of spirometry
on GPs’ diagnostic differentiation and decision-making,” Respiratory
Medicine, vol.98, no.11, pp.1124–1130, 2004.
[8] R.P.Young, R. Hopkins, T.E. Eaton, “Forced expiratory volume in one
second: not just a lung function test but a marker of premature death
from all causes,” European Respiratory Journal, vol. 30, no.4, pp.616–
622, 2007.
[9] “Standards for the diagnosis and care of patients with chronic
obstructive pulmonary disease,” American Thoracic Society, American
Journal of Respiratory and Critical Care Med, vol.152, pp.77-121, 1995.
[10] D.C. Richter , J.R. Joubert , H. Nell, M.M. Schuurmans, E.M. Irusen,
“Diagnostic value of post-bronchodilator pulmonary function testing to
distinguish between stable, moderate to severe COPD and asthma,
International journal of chronic obstructive pulmonary disorder, vol. 3,
no.4, pp. 693-699, 2008.
[11] Jeffrey M. Haynes, “Pulmonary Function Test Quality in the Elderly: A
Comparison with Younger Adults,” Respiratory care, vol.59, no.1, jan
2014.
[12] American Thoracic Society, Standardization of spirometry: a summary
of recommendations from the American Thoracic Society. 1987 update,
Ann Intern Med, vol.108, pp. 217–220, 1988.
[13] V. Bellia, R. Pistelli, F. Catalano, R. Antonelli-Incalzi, V. Grassi, G.
Meillo, et al . “Quality control of spirometry in the elderly: the SARA
study,” Am J Respir Crit Care Med, vol. 161, no.4, pp. 1094-1100, 2000.
[14] L. Pezzoli, G. Giardini, S. Consonni , I. Dallera, C. Bilotta, G. Ferrario
G et al. “Quality of sprirometric performance in older people. Age
Ageing,” vol. 32, no. 1, pp. 43-46, 2003.
[15] Xu, Ruo, “Improvements to random forest methodology,” Graduate
Thesis and Dissertations, Paper 13052, 2013.
[16] Mark R. Segal, “Machine Learning Benchmarks and Random Forest
Regression,” Kluwer Academic Publishers, 2003.
[17] L. Breiman, “Random forests,” Machine Learning, vol. 45, 2001, pp. 5–
32.
[18] Anne-Laure Boulesteix, Silke Janitza, Jochen Kruppa, Inke R. Konig,
“Overview of Random Forest Methodology and Practical Guidance with
Emphasis on Computational Biology and Bioinformatics,” available at:
http://epub.ub.uni-muenchen.de/13766/1/TR.pdf [19] M. Hilario, A. Kalousis, C. Pellegrini, M. Muller, “Processing and
classification of protein mass spectra, Mass Spectrom Rev, vol.25, pp.
409-449.
[20] Akin Özçift, “Random forests ensemble classifier trained with data
resampling strategy to improve cardiac arrhythmia diagnosis,”
Computers in Biology and Medicine, vol.41, no.5, pp.265-271, 2011.
[21] Benjamin A Goldstein, Alan E Hubbard, Adele Cutler and Lisa F
Barcellos, “An application of Random Forests to a genome-wide
association dataset: Methodological considerations & new findings,”
BMC genetics, vol.11, no.49, 2010.
[22] Nahit Emanet, Halil R Öz, Nazan Bayram and Dursun Delen, “A
comparative analysis of machine learning methods for classification type
decision problems in healthcare,” Decision Analytics, vol.1, no.6, pp.1-
20, 2014.
[23] K. J. Archer and R.V. Kimes, “Empirical characterization of random
forest variable importance measures,” Computational Statistics and Data
Analysis, vol. 52, pp. 2249-2260.
[24] J.H. Friedman, Multivariate adaptive regression splines, Ann. Stat., vol.
19, pp. 1–141, 1991.
[25] Dani Guzmán, Francisco Javier de Cos Juez, Fernando Sánchez
Lasheras, Richard loop adaptiev Myers and Laura Young, “Deformable
mirror model for open- optics using multivariate adaptive regression
splines,” Optics Express, vol.18, no.7, pp. 6492 – 6505, 2013.
[26] P. A. W. Lewis and J. G. Stevens, “Nonlinear modeling of time series
using multivariate adaptive regression splines (mars),” Journal of the
American Statistical Association, vol. 86, no. 416, pp. 864-877, 1991.
[27] Peter C. Austin, “A comparison of regression trees, logistic regression,
generalized additive models, and multivariate adaptive regression splines
for predicting AMI mortality,” Statistics in Medicine, vol. 26, pp. 2937–
2957,