A Genetic Algorithm for Clustering on Image Data

Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.





References:
[1] P. K. Agarwal and C. M. Procopiuc, "Exact and approximation
algorithms for clustering," in Proceedings of the ninth annual ACMSIAM
symposium on Discrete algorithms, 1998, pp. 658-667.
[2] P. J. Angeline, "Adaptive and self-adaptive evolutionary computations,"
Computational Intelligence: A Dynamic System Perspective, Piscataway,
IEEE Press, 1995, pp. 152-163.
[3] A. Demiriz, K. P. Bennett, and M. J. Embrechts, "Semi-supervised
clustering using genetic algorithms," R.P.I. Math Report No. 9901,
Rensselaer Polytechnic Institute, 1999.
[4] W. DuMouchel, C. Volinsky, T. Johnson, C. Cortes, and D. Pregibon,
"Squashing flat files flatter," in Proceedings of the ACM SIGKDD
Conference on Knowledge Discovery and Data Mining, 1999, pp. 6-15.
[5] V. Estivill-Castro and A.T. Murray, "Spatial clustering for data mining
with genetic algorithms," in Proceedings of the International ICSC
Symposium on Engineering of Intelligent Systems, 1998, pp. 317-323.
[6] J. Grabmeier and A. Rudolph, "Techniques of cluster algorithms in data
mining," Data Mining and Knowledge Discover, 6, 2002, pp. 303-360.
[7] L. O Hall, I. B. Ozyurt, , J. C. Bezdek, "Clustering with a genetically
optimized approach," IEEE Transactions on Evolutionary Computation,
3(2), 1999, pp. 103-112.
[8] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann Publishers, 2000.
[9] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review,"
ACM Computing Surveys, 31(3), 1999, pp. 264-323.
[10] C.-Y. Lee and E. K.Antonsson, "Dynamic partitional clustering using
evolution strategies," in Proceedings of the Third Asia Pacific
Conference on Simulated Evolution and Learning, 2000.
[11] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press, 1998.
[12] M. Painho and F. Bação, "Using genetic algorithms in clustering
problems," in Proceedings of GeoComputation Conference, 2000.