Combining Bagging and Boosting

Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using a voting methodology of bagging and boosting ensembles with 10 subclassifiers in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique was the most accurate.





References:
[1] Opitz D. & Maclin R., Popular Ensemble Methods: An Empirical Study,
Artificial Intelligence Research, Vol. 11, 1999, pp. 169-198.
[2] Dietterich, T.G., Ensemble methods in machine learning. In Kittler, J.,
Roli, F., eds.: Multiple Classifier Systems. Lecture Notes Computer
Sciences, Vol. 1857, 2001, pp. 1-15.
[3] Breiman L., Bagging Predictors. Machine Learning, Vol. 24, No. 3,
1996, pp. 123-140.
[4] Freund Y. and Robert E. Schapire. Experiments with a New Boosting
Algorithm, Proceedings of ICML-96, pp. 148-156.
[5] Webb G. I., MultiBoosting: A Technique for Combining Boosting and
Wagging, Machine Learning, Vol. 40, 2000, pp. 159-196.
[6] Melville P., Mooney R., Constructing Diverse Classifier Ensembles
using Artificial Training Examples, Proceedings of IJCAI-2003, pp.505-
510, Acapulco, Mexico.
[7] Bauer, E. & Kohavi, R., An empirical comparison of voting
classification algorithms: Bagging, boosting, and variants. Machine
Learning, Vol. 36, 1999, pp. 105-139.
[8] Blake, C.L. & Merz, C.J., UCI Repository of machine learning
databases. Irvine, CA: University of California, 1998, Department of
Information and Computer Science.
(http://www.ics.uci.edu/~mlearn/MLRepository.html)
[9] Bosch, A. and Daelemans W., Memory-based morphological analysis.
Proceedings of 37th Annual Meeting of the ACL, 1999, University of
Maryland, pp. 285-292 (http://ilk.kub.nl/~antalb/ltuia/week10.html).
[10] Kotsiantis, S., Pierrakeas, C. and Pintelas, P., Preventing student dropout
in distance learning systems using machine learning techniques, Lecture
Notes in AI, Springer-Verlag Vol 2774, 2003, pp 267-274.
[11] Salzberg, S., On Comparing Classifiers: Pitfalls to Avoid and a
Recommended Approach, Data Mining and Knowledge Discovery, Vol.
1, 1997, pp. 317-328.
[12] Quinlan J.R., C4.5: Programs for machine learning. 1993, Morgan
Kaufmann, San Francisco.
[13] Domingos P. & Pazzani M., On the optimality of the simple Bayesian
classifier under zero-one loss. Machine Learning, Vol. 29, 1997, pp.
103-130.
[14] Holte, R. C., Very simple classification rules perform well on most
commonly used datasets, Machine Learning, Vol. 11, 1993, pp. 63-90.
[15] Iba, W., & Langley, P., Induction of one-level decision trees,
Proceedings of Ninth International Machine Learning Conference,
1992. Aberdeen, Scotland.
[16] Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S., Boosting the
margin: A new explanation for the effectiveness of voting methods. The
Annals of Statistics, Vol. 26, 1998, pp. 1651-1686.
[17] Furnkranz, J., Separate-and-Conquer Rule Learning, Artificial
Intelligence Review, Vol. 13, 1999, pp. 3-54.
[18] Jensen F., An Introduction to Bayesian Networks. 1996, Springer.
[19] Murthy, Automatic Construction of Decision Trees from Data: A Multi-
Disciplinary Survey, Data Mining and Knowledge Discovery, Vol. 2,
1998, pp. 345-389.