Using Genetic Programming to Evolve a Team of Data Classifiers

The purpose of this paper is to demonstrate the ability of a genetic programming (GP) algorithm to evolve a team of data classification models. The GP algorithm used in this work is “multigene" in nature, i.e. there are multiple tree structures (genes) that are used to represent team members. Each team member assigns a data sample to one of a fixed set of output classes. A majority vote, determined using the mode (highest occurrence) of classes predicted by the individual genes, is used to determine the final class prediction. The algorithm is tested on a binary classification problem. For the case study investigated, compact classification models are obtained with comparable accuracy to alternative approaches.




References:
[1] Koza JR. Genetic programming: on the programming of computers by
means of natural selection. The MIT Press, USA, 1992.
[2] Jabeen, H. And Baig, A.R. Review of classification using Genetic
Programming, Int. J. of Eng. Sci. And Tech., Vol 2 (2), 94 - 103., 2010.
[3] Bojarczuk, Lopes, H.S. and Freitas, A.A. An innovative application of a
constrained-syntax genetic programming system to the problem of
predicting survival of patients, Lecture notes in computer science, Vol.
2610, Springer-Verlag., 2003.
[4] Eggermont, J. Data Mining using Genetic Programming: Classification
and Symbolic Regression. Leiden University, PhD Thesis., 2005.
[5] Taskonas, A. A comparison of classification accuracy of four genetic
programming-evolved intelligent structures, Information Sciences, pp.
691-724., 2006.
[6] Ruta, D. and Gabrys, B. Classifier selection for majority voting,
Information fusion 6, 63-81., 2005.
[7] Rogova, G. Combining results of several neural network classifiers,
Neural Networks, 7 (5), pp 777-781., 1994.
[8] Hinchliffe MP, Willis MJ, Hiden H, Tham MT, McKay B & Barton,
GW. Modelling chemical process systems using a multi-gene genetic
programming algorithm. In Genetic Programming: Proceedings of the
First Annual Conference (late breaking papers), 56-65. The MIT Press,
USA, 1996.
[9] Searson, D. GPTIPS: Genetic Programming & Symbolic Regression for
MATLAB, http://sites.google.com/site/gptips4matlab/, 2009.
[10] Searson, D.P., Leahy, D.E. and Willis, M.J. GPTIPS: An open source
genetic programming toolbox for multigene symbolic regression.
Lecture Notes in Engineering and Computer Science: Proceedings of
the international multiconference of engineers and computer scientists
IMECS 2010, 17-19 March, 2010, Hong Kong.