Initializing K-Means using Genetic Algorithms

K-Means (KM) is considered one of the major algorithms widely used in clustering. However, it still has some problems, and one of them is in its initialization step where it is normally done randomly. Another problem for KM is that it converges to local minima. Genetic algorithms are one of the evolutionary algorithms inspired from nature and utilized in the field of clustering. In this paper, we propose two algorithms to solve the initialization problem, Genetic Algorithm Initializes KM (GAIK) and KM Initializes Genetic Algorithm (KIGA). To show the effectiveness and efficiency of our algorithms, a comparative study was done among GAIK, KIGA, Genetic-based Clustering Algorithm (GCA), and FCM [19].




References:
[1] R.J. Bauer, "Genetic Algorithms and Investment Strategies", Wiley
Finance Editions, New York, John Wiley and Sons, 1994.
[2] H. Ben Amor, and A. Rettinger, "Intelligent exploration for genetic
algorithms: Using self-organizing maps in evolutionary computation", In
proceedings of the 2005 conference on Genetic and evolutionary
computation, 2005, pp. 1531-1538.
[3] P. Bradley, and U. Fayyad, "Refining Initial Points for K-Means
Clustering," In Proceeding of 15th International Conference on Machine
Learning, 1998, pp. 91-99.
[4] P. Chi, "Genetic Search with Proportion Estimation", In proceedings of
the Third Int. Con. on Genetic Algorithms (ICGA), San Mateo,
California, 1989, pp. 92-97.
[5] U. Fayyad, C. Reina, and J. Bradley, "Initialization of iterative
refinement clustering algorithms", In proceedings of Fourth Int. Con. on
Knowledge Discovery and Data Mining, AAAI, pp.194-198, 1998.
[6] C. Fraley, and A. Raftery, "How Many Clusters? Which Clustering
Method? Answers Via Model-Based Cluster Analysis," Technical
Report 329, Dept. of Statistics, University of Washington, Seattle, 1998.
[7] D. Goldberg, "Computer-Aided Gas Pipeline Operation Using Genetic
Algorithms And Rule Learning," Ph.D. thesis, University of Michigan,
Ann Arbor, 1983.
[8] D. Goldberg, "Genetic Algorithms In Search, Optimization, And
Machine Learning," Addison-Wesley, New York, 1989.
[9] J. Grefenstette, C. Ramsey, and A. Schultz, Learning Sequential
Decision Rules Using Simulation Models And Competition, Machine
Learning 5, Vol. 4, 1990, pp. 355-381.
[10] L. Hall, B. Ozyurt, and J. Bezdek, "Clustering With A Genetically
Optimized Approach," IEEE Transactions on Evolutionary computation,
Vol. 3, No. 2, 1999, pp 103-112.
[11] J. Han, and M. Kamber, "Data Mining: Concepts And Techniques,"
Morgan Kaufmann Publishers, 2001.
[12] J. Holland, "Adaptation In Natural And Artificial Systems," University
of Michigan Press, 1975.
[13] L. Kaufman, and P. Rousseeuw, "Finding Groups In Data: And
Introduction To Cluster Analysis," John Wiley and Sons, 1990.
[14] A. Korol, I. Preygel, and S. Preygel, "Algorithms Of Estimation And
Population-Genetic Models," Central Library of Imperial College,
Chapman and Hall, 1994.
[15] K. Krishna, and M. Murty, "Genetic K-Means Algorithm," IEEE
Transactions on Systems, Vol. 29, NO. 3, 1999, pp. 433-439.
[16] Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown, "FGKA: A Fast
Genetic K-Means Clustering Algorithm," ACM Symposium on Applied
Computing, 2004
[17] Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown, "Incremental Genetic
K-Means Algorithm And Its Application In Gene Expression Data
Analysis," BMC Bioinformatics, 2004.
[18] J. MacQueen, "Some Methods For Classification And Analysis Of
Multivariate Observations," In proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability, University of
California Press, 1967, pp. 281-297.
[19] U. Maulik, and S. Bandyopadhyay, "Genetic Algorithm-Based
Clustering Technique", Pattern Recognition 33, 1999, pp. 1455-1465.
[20] J. Pena, J. Lozano, and P. Larranaga, "An Empirical Comparison of Four
Initialization Methods For The K-Means Algorithm," Pattern
Recognition Letters, Vol. 20 No. 10, 1999, pp. 1027-10.