A New Algorithm for Cluster Initialization

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.





References:
[1] J. Mac Queen, Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. Math. Stat. and prob, 1967, pp. 281-97. [2] P. Bradley, U. Fayyad, Refining initial points for k-means clustering, Proceedings 15th International Conf, on Machine Learning, San Francisco, CA, 1998, pp. 91-99. [3] N. Nasrabadi and R. King, Image coding using vector quantization: a review. IEEE trans. Comm. Vol. 36 (8), 1988, pp. 957-970. [4] J. Pena, J. Lozano and P. Larranaga, An Empirical comparison of four initialization methods for the k-means algorithm, Pattern Recognition Letters Vol. 20, 1999, pp. 1027-1040. [5] M. Ismail and M. Kamel, Multidimensional data clustering utilization hybrid search strategies. Pattern Recognition Vol. 22 (1), 1989, pp. 75-89. [6] G. Babu and M. Murty, A near optimal initial seed value selection in kmeans algorithm using a genetic algorithm. Pattern Recognition Letters Vol. 14, 1993, pp. 763-769. [7] C. Huang and R. Harris, A Comparison of several vector quantization codebook generation approaches. IEEE trans. Image Proc. Vol 2 (1), 1993, pp. 108-112. [8] Y. Linde, A. Buzo and R. Gray, An algorithm for vector quantizer design. IEEE trans. Comm. Vol. 28 (1), 1980, pp. 84-95. [9] N. Venkateswarlu and P. Raju. Fast isodata clustering algorithms. pattern recognition Vol. 25 (3), 1992, pp. 335-342. [10] A. Gersho and R. Gray, Vector quantization and signal compression, CAP, 1992. [11] M. Meila and D. Heckerman, An experimental comparison of several clustering methods, Microsoft Research Technical Report MSR-TR-98-06, Redmond, WA, 1998.