Abstract: The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.
Abstract: In this paper, we propose an algorithm to compute
initial cluster centers for K-means clustering. Data in a cell is
partitioned using a cutting plane that divides cell in two smaller cells.
The plane is perpendicular to the data axis with the highest variance
and is designed to reduce the sum squared errors of the two cells as
much as possible, while at the same time keep the two cells far apart
as possible. Cells are partitioned one at a time until the number of
cells equals to the predefined number of clusters, K. The centers of
the K cells become the initial cluster centers for K-means. The
experimental results suggest that the proposed algorithm is effective,
converge to better clustering results than those of the random
initialization method. The research also indicated the proposed
algorithm would greatly improve the likelihood of every cluster
containing some data in it.