Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Scholarly

Volume:13, Issue: 1, 2019 Page No: 6 - 10

International Journal of Information, Control and Computer Sciences

ISSN: 2517-9942

77 Downloads

Abstract Full Text Download References Share Add to Favorites

DOI:10.5281/zenodo.2571742 BibTeX JSON

Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Gradient boosting methods have been proven to be a very important strategy. Many successful machine learning solutions were developed using the XGBoost and its derivatives. The aim of this study is to investigate and compare the efficiency of three gradient methods. Home credit dataset is used in this work which contains 219 features and 356251 records. However, new features are generated and several techniques are used to rank and select the best features. The implementation indicates that the LightGBM is faster and more accurate than CatBoost and XGBoost using variant number of features and records.

Authors:

Essam Al Daoud

Keywords:

References:

[1] Y. Freund, R. E. Schapire, “A decision-theoretic generalizationof online learning and an application to boosting," Journal of Computer andSystem Sciences, vol. 55, no. 1, pp 119-139, 1997.
[2] P. Kontschieder, M. Fiterau, A. Criminisi, S. Rota Bulo. “Deep neural decision forests,” In Proceedings of the IEEE International Conference on Computer Vision, pp 1467–1475, 2015.
[3] J. C. Wang, T. Hastie, “Boosted varying-coefficient regression models for product demand prediction,” Journal of Computational and Graphical Statistics, vol. 23, no. 2, pp 361–382, 2014.
[4] E Al Daoud, “Intrusion Detection Using a New Particle Swarm Method and Support Vector Machines,” World Academy of Science, Engineering and Technology, vol. 77, 59-62, 2013.
[5] E. Al Daoud, H Turabieh, “New empirical nonparametric kernels for support vector machine classification,” Applied Soft Computing, vol. 13, no. 4, 1759-1765, 2013.
[6] E. Al Daoud, "An Efficient Algorithm for Finding a Fuzzy Rough Set Reduct Using an Improved Harmony Search," I.J. Modern Education and Computer Science, vol. 7, no. 2, pp16-23, 2015.
[7] Y. Zhang, A. Haghani. “A gradient boosting method to improve travel time prediction. Transportation Research Part C,” Emerging Technologies, vol. 58, 308–324, 2015.
[8] K. Guolin, M. Qi, F. Thomas, W. Taifeng, C. Wei, M. Weidong, Y. Qiwei, L. Tie-Yan, "LightGBM: A Highly Efficient Gradient Boosting Decision Tree," Advances in Neural Information Processing Systems vol. 30, pp. 3149-3157, 2017.
[9] A. Dorogush, V. Ershov, A. Gulin "CatBoost: gradient boosting with categorical features support," NIPS, p1-7, 2017.
[10] M. Qi, K. Guolin, W. Taifeng, C. Wei, Y. Qiwei, M. Weidong, L. Tie-Yan, "A Communication-Efficient Parallel Algorithm for Decision Tree," Advances in Neural Information Processing Systems, vol. 29, pp. 1279-1287, 2016.
[11] A. Klein, S. Falkner, S. Bartels, P. Hennig, F. Hutter, “Fast bayesian optimization of machine learning hyperparameters on large datasets,” In Proceedings of Machine Learning Research PMLR, vol. 54, pp 528-536,2017.
[12] J. H. Aboobyda, and M. A. Tarig, “Developing Prediction Model Of Loan Risk In Banks Using Data Mining,” Machine Learning and Applications: An International Journal (MLAIJ), vol. 3, no. 1, pp 1–9, 2016.

Scholarly

International Journal of Information, Control and Computer Sciences

Archive

Last Issue

Commitee

Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Scholarly

International Journal of Information, Control and Computer Sciences

Archive

Last Issue

Commitee

Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Preview