Performance Assessment of Multi-Level Ensemble for Multi-Class Problems

Many supervised machine learning tasks require
decision making across numerous different classes. Multi-class
classification has several applications, such as face recognition, text
recognition and medical diagnostics. The objective of this article is
to analyze an adapted method of Stacking in multi-class problems,
which combines ensembles within the ensemble itself. For this
purpose, a training similar to Stacking was used, but with three
levels, where the final decision-maker (level 2) performs its training
by combining outputs from the tree-based pair of meta-classifiers
(level 1) from Bayesian families. These are in turn trained by pairs
of base classifiers (level 0) of the same family. This strategy seeks to
promote diversity among the ensembles forming the meta-classifier
level 2. Three performance measures were used: (1) accuracy, (2)
area under the ROC curve, and (3) time for three factors: (a)
datasets, (b) experiments and (c) levels. To compare the factors,
ANOVA three-way test was executed for each performance measure,
considering 5 datasets by 25 experiments by 3 levels. A triple
interaction between factors was observed only in time. The accuracy
and area under the ROC curve presented similar results, showing
a double interaction between level and experiment, as well as for
the dataset factor. It was concluded that level 2 had an average
performance above the other levels and that the proposed method
is especially efficient for multi-class problems when compared to
binary problems.




References:
[1] M. F. F. Oliveira, “An´alise de mercado: uma ferramenta de mapeamento
de oportunidades de neg´ocio em t´ecnicas de geomarketing e aprendizado
de m´aquina,” 2016.
[2] S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine
learning: A review of classification techniques,” 2007.
[3] S. Chebrolu, A. Abraham, and J. P. Thomas, “Feature deduction and
ensemble design of intrusion detection systems,” Computers & security,
vol. 24, no. 4, pp. 295–307, 2005.
[4] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and
S. U. Khan, “The rise of “big data” on cloud computing: Review and
open research issues,” Information Systems, vol. 47, pp. 98–115, 2015.
[5] G. Malik and M. Tarique, “On machine learning techniques for multi
class classification,” International Journal of Advancements in Research
& Technology, vol. 3, no. 2, 2014.
[6] J. Liu, S. Ranka, and T. Kahveci, “Classification and feature selection
algorithms for multi-class cgh data,” Bioinformatics, vol. 24, no. 13,
pp. i86–i95, 2008.
[7] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical
machine learning tools and techniques. Morgan Kaufmann, 2016.
[8] M. Fernandez-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we
need hundreds of classifiers to solve real world classification problems,”
J. Mach. Learn. Res, vol. 15, no. 1, pp. 3133–3181, 2014.
[9] J. A. Saez, J. Luengo, and F. Herrera, “Evaluating the classifier behavior
with noisy data considering performance and robustness: the equalized
loss of accuracy measure,” Neurocomputing, vol. 176, pp. 26–35, 2016.
[10] Y. Ren, L. Zhang, and P. N. Suganthan, “Ensemble classification
and regression-recent developments, applications and future directions
[review article],” IEEE Computational Intelligence Magazine, vol. 11,
no. 1, pp. 41–53, 2016.
[11] J.-C. Levesque, C. Gagne, and R. Sabourin, “Bayesian hyperparameter
optimization for ensemble learning,” arXiv preprint arXiv:1605.06394,
2016.
[12] L. M. Vriesmann, Selec¸ ˜ao Dinˆamica de Subconjunto de Classificadores.
PhD thesis, Pontif´ıcia Universidade Cat´olica do Paran´a, 2012.
[13] R. Vilalta and Y. Drissi, “A perspective view and survey of
meta-learning,” Artificial Intelligence Review, vol. 18, no. 2, pp. 77–95,
2002.
[14] A. K. Seewald, Towards understanding stacking: studies of a general
ensemble learning scheme. na, 2003. [15] S. Dˇzeroski and B. ˇ Zenko, “Is combining classifiers with stacking
better than selecting the best one?,” Machine learning, vol. 54, no. 3,
pp. 255–273, 2004.
[16] R. Lorbieski and S. Nassar, “Performance evaluation in multi-level
ensemble.,” 2017. Manuscript submitted for publication.
[17] A. Ledezma, R. Aler, A. Sanchis, and D. Borrajo, “Ga-stacking:
Evolutionary stacked generalization,” Intelligent Data Analysis, vol. 14,
no. 1, pp. 89–119, 2010.
[18] D. H. Wolpert, “Stacked generalization,” Neural networks, vol. 5, no. 2,
pp. 241–259, 1992.
[19] G. Sigletos, G. Paliouras, C. D. Spyropoulos, and M. Hatzopoulos,
“Combining information extraction systems using voting and stacked
generalization,” Journal of Machine Learning Research, vol. 6, no. Nov,
pp. 1751–1782, 2005.
[20] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2,
pp. 123–140, 1996.
[21] K. M. Ting and I. H. Witten, “Issues in stacked generalization,” J. Artif.
Intell. Res.(JAIR), vol. 10, pp. 271–289, 1999.
[22] G. Tsirogiannis, D. Frossyniotis, J. Stoitsis, S. Golemati, A. Stafylopatis,
and K. Nikita, “Classification of medical data with a robust multi-level
combination scheme,” in Neural Networks, 2004. Proceedings. 2004
IEEE International Joint Conference on, vol. 3, pp. 2483–2487, IEEE,
2004.
[23] T. Li, S. Zhu, and M. Ogihara, “Using discriminant analysis for
multi-class classification: an experimental investigation,” Knowledge and
information systems, vol. 10, no. 4, pp. 453–472, 2006.
[24] A. K. Tanwani, J. Afridi, M. Z. Shafiq, and M. Farooq, “Guidelines
to select machine learning scheme for classification of biomedical
datasets,” in European Conference on Evolutionary Computation,
Machine Learning and Data Mining in Bioinformatics, pp. 128–139,
Springer, 2009.
[25] T. Windeatt and R. Ghaderi, “Coding and decoding strategies for
multi-class learning problems,” Information Fusion, vol. 4, no. 1,
pp. 11–21, 2003.
[26] G. Tsoumakas and I. Vlahavas, “Random k-labelsets: An ensemble
method for multilabel classification,” in European Conference on
Machine Learning, pp. 406–417, Springer, 2007.
[27] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera,
“An overview of ensemble methods for binary classifiers in multi-class
problems: Experimental study on one-vs-one and one-vs-all schemes,”
Pattern Recognition, vol. 44, no. 8, pp. 1761–1776, 2011.
[28] A. Jurek, Y. Bi, S. Wu, and C. Nugent, “A survey of commonly used
ensemble-based classification techniques,” The Knowledge Engineering
Review, vol. 29, no. 05, pp. 551–581, 2014.
[29] E. Menahem, L. Rokach, and Y. Elovici, “Troika–an improved stacking
schema for classification tasks,” Information Sciences, vol. 179, no. 24,
pp. 4097–4122, 2009. [30] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.
Witten, “The weka data mining software: an update,” ACM SIGKDD
explorations newsletter, vol. 11, no. 1, pp. 10–18, 2009.
[31] A. Prodromidis, P. Chan, and S. Stolfo, “Meta-learning in distributed
data mining systems: Issues and approaches,” Advances in distributed
and parallel knowledge discovery, vol. 3, pp. 81–114, 2000.
[32] Y. Freund, R. E. Schapire, et al., “Experiments with a new boosting
algorithm,” in icml, vol. 96, pp. 148–156, 1996.
[33] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network
classifiers,” Machine learning, vol. 29, no. 2-3, pp. 131–163, 1997.
[34] J. Quinlan, “C4. 5: Programs for empirical learning morgan kaufmann,”
San Francisco, CA, 1993.
[35] G. H. John and P. Langley, “Estimating continuous distributions in
bayesian classifiers,” in Proceedings of the Eleventh conference on
Uncertainty in artificial intelligence, pp. 338–345, Morgan Kaufmann
Publishers Inc., 1995.
[36] W. Iba and P. Langley, “Induction of one-level decision trees,” in
Proceedings of the ninth international conference on machine learning,
pp. 233–240, 1992.
[37] K. Bache and M. Lichman, “Uci machine learning repository,” 2013.
[38] D. J. Hand and R. J. Till, “A simple generalisation of the area under the
roc curve for multiple class classification problems,” Machine learning,
vol. 45, no. 2, pp. 171–186, 2001.
[39] B. Cohen, Explaining Psychological Statistics. Wiley, 2013.