Genetic Programming Based Data Projections for Classification Tasks

In this paper we present a GP-based method for automatically evolve projections, so that data can be more easily classified in the projected spaces. At the same time, our approach can reduce dimensionality by constructing more relevant attributes. Fitness of each projection measures how easy is to classify the dataset after applying the projection. This is quickly computed by a Simple Linear Perceptron. We have tested our approach in three domains. The experiments show that it obtains good results, compared to other Machine Learning approaches, while reducing dimensionality in many cases.





References:
[1] N. Cristianini and J. Shawe-Taylor. An introduction to Support Vector
Machines (and other kernel-based learning methods). Cambridge
University Press, 2000.
[2] T. Fawcett and P. Utgoff. A hybrid method for feature generation. In
Proceedings of the Eighth International Workshop on Machine Learning,
pages 137- 141, Evanston, IL.
[3] S. Kramer. Cn2-mci: A two-step method for constructive induction. In
Proceedings of ML-COLT-94.
[4] B. Pfahringer. Cipf 2.0: A robust constructive induction system. In
Proceedings of ML-COLT-94, 1994. W. D. Doyle, "Magnetization
reversal in films with biaxial anisotropy," in 1987 Proc. INTERMAG
Conf., pp. 2.2-1-2.2-6. G. W. Juette and L. E. Zeffanella, "Radio noise
currents n short sections on bundle conductors (Presented Conference
Paper style)," presented at the IEEE Summer power Meeting, Dallas,
TX, June 22-27, 1990, Paper 90 SM 690-0 PWRS.
[5] John R. Koza. Genetic Programming: On the Programming of
Computers by Means of Natural Selection. MIT Press, Cambridge, MA,
USA, 1992.
[6] John R. Koza. Genetic Programming II: Automatic Discovery of
Reusable Programs. MIT Press, Cambridge Massachusetts, May 1994.
[7] B.D. Ripley. Pattern Recognition and Neural Networks. Cambridge:
Cambridge University Press, 1996.
[8] D. Michie, D. J. Spiegelhalter, and C.C. Taylor. Machine learning,
neural and statistical classification. Ellis Horwood, 1994.
[9] Benjamin Blankertz, Gabriel Curio, and Klaus-Robert M¨uller.
Classifying single trial eeg: Towards brain computer interfacing. In
Advances in Neural Inf. Proc. Systems 14 (NIPS 01), 2002.
[10] Fernando E. B. Otero, Monique M. S. Silva, Alex A. Freitas, and Julio
C. Nievola. Genetic programming for attribute construction in data
mining. In Conor Ryan, Terence Soule, Maarten Keijzer, Edward Tsang,
Riccardo Poli, and Ernesto Costa, editors, Genetic Programming,
Proceedings of EuroGP-2003, volume 2610 of LNCS, pages 389-398,
Essex, 14-16 April 2003. Springer-Verlag.
[11] Krzysztof Krawiec. Genetic programming-based construction of features
for machine learning and knowledge discovery tasks. Genetic
Programming and Evolvable Machines, 3(4):329-343, December 2002.
[12] Tom Howley and Michael G. Madden. The genetic kernel support
vector machine: Description and evaluation. Artificial Intelligence
Review, To appear, 2005.
[13] S. Davis S. Perkins J. Ma R. Porter D. Eads, D. Hill and J. Theiler.
Genetic algorithms and support vector machines for time series
classification. In Proceedings SPIE 4787 Conference on Visualization
and Data Analysis, pages 74-85, 2002.
[14] John J. Szymanski, Steven P. Brumby, Paul Pope, Damian Eads, Diana
Esch-Mosher, Mark Galassi, Neal R. Harvey, Hersew D. W. McCulloch,
Simon J. Perkins, Reid Porter, James Theiler, A. Cody Young, Jeffrey J.
Bloch, and Nancy David. Feature extraction from multiple data sources
using genetic programming. In Sylvia S. Shen and Paul E. Lewis,
editors, Algorithms and Technologies for Multispectral, Hyperspectral,
and Ultraspectral Imagery VIII, volume 4725 of SPIE, pages 338-345,
August 2002.
[15] Neal R. Harvey, James Theiler, Steven P. Brumby, Simon Perkins, John
J. Szymanski, Jeffrey J. Bloch, Reid B. Porter, Mark Galassi, and A.
Cody Young. Comparison of GENIE and conventional supervised
classifiers for multispectral image feature extraction. IEEE Transactions
on Geoscience and Remote Sensing, 40(2):393-404, February 2002.