A New Evolutionary Algorithm for Cluster Analysis

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.





References:
[1] Yi-Tung Kao, Erwie Zahara and I-Wei Kao, "A hybridized approach to
data clustering" Expert Systems with Applications, Volume 34, Issue 3,
pp. 1754-1762, April 2008.
[2] J.F. Lu, J.B. Tang, Z.M. Tang and J.Y. Yang, "Hierarchical
initialization approach for K-Means clustering", Pattern Recognition
Letters, January 2008.
[3] Hong Zhou and Yonghuai Liu, "Accurate integration of multi-view
range images using k-means clustering", Pattern Recognition, Volume
41, Issue 1, pp. 152-175, January 2008.
[4] Michael Laszlo and Sumitra Mukherjee , "A genetic algorithm that
exchanges neighboring centers for k-means clustering", Pattern
Recognition Letters, Volume 28, Issue 16, pp. 2359-2366, 1 December
2007.
[5] Amir Ahmad and Lipika Dey , "A k-mean clustering algorithm for
mixed numeric and categorical data", Data & Knowledge
Engineering, Volume 63, Issue 2, pp. 503-527, November 2007.
[6] Chung-Chian Hsu, Chin-Long Chen and Yu-Wei Su , "Hierarchical
clustering of mixed data based on distance hierarchy", Information
Sciences, Volume 177, Issue 20, 15, pp. 4474-4492, October 2007.
[7] Mohammad Fathian and Babak Amiri, "A honey-bee mating approach
on clustering", The International Journal of Advanced Manufacturing
Technology, DOI 10.1007/s00170-007-1132-7.
[8] Stephen J. Redmond and Conor Heneghan , "A method for initialising
the K-means clustering algorithm using kd-trees", Pattern Recognition
Letters, Volume 28, Issue 8, pp. 965-973, June 2007.
[9] Georgios P. Papamichail and Dimitrios P. Papamichail , "The k-means
range algorithm for personalized data clustering in e-commerce",
European Journal of Operational Research, Volume 177, Issue 3, pp.
1400-1408, March 2007
[10] R.J. Kuo, H.S. Wang, Tung-Lai Hu and S.H. Chou , "Application of ant
K-means on clustering analysis", Computers & Mathematics with
Applications, Volume 50, Issues 10-12, pp. 1709-1724, November-
December 2005.
[11] P. S. Shelokar, V. K. Jayaraman and B. D. Kulkarni ",An ant colony
approach for clustering ",Analytica Chimica Acta, Volume 509, Issue
2, pp. 187-195, May 2004.
[12] Michael K. Ng and Joyce C. Wong , "Clustering categorical data sets
using tabu search techniques", Pattern Recognition, Volume 35, Issue
12, December 2002, pp. 2783-2790
[13] C. S. Sung and H. W. Jin, "A tabu-search-based heuristic for
clustering", Pattern Recognition, Volume 33, Issue 5, pp. 849-858, May
2000.
[14] Khaled S. Al-Sultan, "A Tabu search approach to the clustering
problem", Pattern Recognition, Volume 28, Issue 9, pp. 1443-
1451, September 1995.
[15] Michael Laszlo and Sumitra Mukherjee, "A genetic algorithm that
exchanges neighboring centers for k-means clustering" Pattern
Recognition Letters, Volume 28, Issue 16, pp. 2359-2366, December
2007
[16] J. Kennedy and R. Eberhart, "Particle Swarm Optimization," IEEE
International Conf. on Neural Networks, Piscataway, NJ, vol. 4, pp.
1942-1948, 1995.
[17] J. Olamaei , T. Niknam and G. Gharehpetian, "Application of Particle
Swarm Optimization for Distribution Feeder Reconfiguration
Considering Distributed Generators", accepted for future publication in
Applied Mathematics and Computation journal, 2008.
[18] Tai-Hsi Wu, Chin-Chih Chang and Shu-Hsing Chung, "A simulated
annealing algorithm for manufacturing cell formation problems",
Expert Systems with Applications, Volume 34, Issue 3, April 2008,
Pages 1609-1617.
[19] Klaus Meer, "Simulated Annealing versus Metropolis for a TSP
instance" Information Processing Letters, Volume 104, Issue 6, 16
December 2007, Pages 216-219