Machine Learning for Aiding Meningitis Diagnosis in Pediatric Patients

This paper presents a Machine Learning (ML) approach to support Meningitis diagnosis in patients at a children’s hospital in Sao Paulo, Brazil. The aim is to use ML techniques to reduce the use of invasive procedures, such as cerebrospinal fluid (CSF) collection, as much as possible. In this study, we focus on predicting the probability of Meningitis given the results of a blood and urine laboratory tests, together with the analysis of pain or other complaints from the patient. We tested a number of different ML algorithms, including: Adaptative Boosting (AdaBoost), Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest and Support Vector Machines (SVM). Decision Tree algorithm performed best, with 94.56% and 96.18% accuracy for training and testing data, respectively. These results represent a significant aid to doctors in diagnosing Meningitis as early as possible and in preventing expensive and painful procedures on some children.





References:
[1] A. Agresti, Categorical data analysis. John Wiley & Sons, 2003, vol.482.
[2] E. Alpaydin, Introduction to machine learning. MIT press, 2009.
[3] D. G. Altman, Practical statistics for medical research. CRC press, 1990.
[4] D. G. Altman and J. M. Bland, “Diagnostic tests. 1:(Sensitivity and specificity.” BMJ: British Medical Journal, vol. 308, no. 6943, p. 1552, 1994.
[5] N. S. Altman, “An introduction to kernel and nearest-neighbor non-parametric regression,” The American Statistician, vol. 46, no. 3, pp.175–185, 1992.
[6] P. Baldi, S. Brunak, and F. Bach, Bioinformatics: the machine learning approach. MIT press, 2001.
[7] A. G. Bonavigo, V. Gelinski, G. d. M. Costa, J. Plewka, and M. A. Costa, “Comparação entre a contagem manual e automatizada de células no líquido cefalorraquidiano,” Rev. bras. anal. clin, vol. 41, no. 1, pp. 47–50, 2009.
[8] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
[9] G. Caruso, L. Genovese, G. Maricchiolo, and A. Modica, “Haemato-logical, biochemical and immunological parameters as stress indicatorsin dicentrarchus labrax and sparus aurata farmed in off-shore cages,” Aquaculture International, vol. 13, no. 1-2, pp. 67–73, 2005.
[10] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote:synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
[11] D. Chopra, N. Joshi, and I. Mathur, Mastering Natural Language Processing with Python. Packt Publishing Ltd, 2016.
[12] E. F. Codd, “A relational model of data for large shared data banks,” Communications of the ACM, vol. 13, no. 6, pp. 377–387, 1970.
[13] S. R. Comar, N. de Araújo Machado, T. G. Dozza, and P. Haas, “Análise citológica do líquido cefalorraquidiano,” Estudos de Biologia, vol. 31, no. 73/75, 2009.
[14] R. J. Ferro and R. L. Makinistian, “El líquido cefalorraquídeo,” Publi-cación digital de la 1ra Cátedra de Clínica Médica y Terapéutica y laCarrera de Posgrado de especialización en Clínica Médica. Facultadde Ciencias Médicas-Universidad Nacional de Rosario, 2011.
[15] Y. Freund, R. E. Schapireet al., “Experiments with a new boostingalgorithm,” in Icml, vol. 96. Citeseer, 1996, pp. 148–156.
[16] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning. Springer series in statistics New York, NY, USA:, 2001, vol. 1, no. 10.
[17] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
[18] “Function to extract text in rtf files,” Gilson Filho. (Online). Available:https://gist.github.com/gilsondev/7c1d2d753ddb522e7bc22511cfb08676
[19] M. J. Halvey and M. T. Keane, “An assessment of tag presentation techniques,” in Proceedings of the 16th international conference on World Wide Web. ACM, 2007, pp. 1313–1314.
[20] S. Haykin, Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994.
[21] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Sup-port vector machines,” IEEE Intelligent Systems and their applications, vol. 13, no. 4, pp. 18–28, 1998.
[22] T. K. Ho, “Random decision forests,” in Document analysis and recogni-tion, 1995., proceedings of the third international conference on, vol. 1.IEEE, 1995, pp. 278–282.
[23] N. Japkowicz, “The class imbalance problem: Significance and strate-gies,” in Proc. of the Int’l Conf. on Artificial Intelligence, 2000.
[24] B. Kamiński, M. Jakubczyk, and P. Szufel, “A framework for sensitivity analysis of decision trees,” Central European journal of operations research, vol. 26, no. 1, pp. 135–159, 2018.
[25] D. Karcher and R. McPherson, “Cerebrospinal, synovial, serous body fluids and alternative specimens,” Henry’s clinical diagnosis and man-agement by laboratory methods, 22th edition. Richard A. McPherson, Matthew R. Pincus eds. Elsevier Saunders, Philadelphia (PA), pp. 480–506, 2011.
[26] A. C. Lorena and A. Carvalho, “Introdução as máquinas de vetores suporte,” Relatório Técnico do Instituto de Ciências Matemáticas e de Computaçao (USP/Sao Carlos), vol. 192, 2003.
[27] R. A. Miller, K. F. Schaffner, and A. Meisel, “Ethical and legal issues related to the use of computer programs in clinical medicine,” Annals of Internal Medicine, vol. 102, no. 4, pp. 529–536, 1985.
[28] M. Mohri, “Foundations of machine learning lecture 11.”
[29] P. A. Morettin and W. O. BUSSAB, Estatística básica. Editora Saraiva, 2017.
[30] M. M. Mukaka, “A guide to appropriate use of correlation coefficient inmedical research,” Malawi Medical Journal, vol. 24, no. 3, pp. 69–71, 2012.
[31] H. B. Neuman and E. R. Wald, “Bacteral meningitis in childhood at the children’s hospital of pittsburgh: 1988-1998,” Clinical pediatrics, vol. 40, no. 11, pp. 595–600, 2001.
[32] W. H. Organizationet al., “International classification of diseases (icd),” http://www. who. int/classifications/icd/en/, 2006.
[33] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourget al.,“Scikit-learn: Machine learning in python,” Journal of machine learning research, vol. 12, no. Oct, pp. 2825–2830, 2011.
[34] M. Sabbatini, “Uso do computador no apoio ao diagnóstico médico,” Revista Informédica, vol. 1, no. 1, pp. 5–11, 1993.
[35] X. Sáez-Llorens and G. H. McCracken Jr, “Bacterial meningitis inchildren,” The lancet, vol. 361, no. 9375, pp. 2139–2148, 2003.
[36] M. I. Schinoni, “Fisiologia hepática,” Gazeta Médica da Bahia, vol. 76, no. 2, 2008.
[37] V. V. Soares and L. J. E. de Souza Vieira, “Percepção de crianças hospitalizadas sobre realização de exames,” Rev Esc Enferm USP, vol. 38, no. 3, pp. 298–306, 2004.
[38] M. N. Theodoridou, V. A. Vasilopoulou, E. E. Atsali, A. M. Pan-galis, G. J. Mostrou, V. P. Syriopoulou, and C. S. Hadjichristodoulou,“Meningitis registry of hospitalized cases in children: epidemiological patterns of acute bacterial meningitis throughout a 32-year period,” BMC Infectious Diseases, vol. 7, no. 1, p. 101, 2007.
[39] L. Wilkinson and M. Friendly, “The history of the cluster heat map,” The American Statistician, vol. 63, no. 2, pp. 179–184, 2009.
[40] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philipet al., “Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14, no. 1, pp.1–37, 2008.