Abstract: This paper presents a supervised clustering algorithm,
namely Grid-Based Supervised Clustering (GBSC), which is able to
identify clusters of any shapes and sizes without presuming any
canonical form for data distribution. The GBSC needs no prespecified
number of clusters, is insensitive to the order of the input
data objects, and is capable of handling outliers. Built on the
combination of grid-based clustering and density-based clustering,
under the assistance of the downward closure property of density
used in bottom-up subspace clustering, the GBSC can notably reduce
its search space to avoid the memory confinement situation during its
execution. On two-dimension synthetic datasets, the GBSC can
identify clusters with different shapes and sizes correctly. The GBSC
also outperforms other five supervised clustering algorithms when
the experiments are performed on some UCI datasets.
Abstract: Clustering algorithms help to understand the hidden
information present in datasets. A dataset may contain intrinsic and
nested clusters, the detection of which is of utmost importance. This
paper presents a Distributed Grid-based Density Clustering algorithm
capable of identifying arbitrary shaped embedded clusters as well as
multi-density clusters over large spatial datasets. For handling
massive datasets, we implemented our method using a 'sharednothing'
architecture where multiple computers are interconnected
over a network. Experimental results are reported to establish the
superiority of the technique in terms of scale-up, speedup as well as
cluster quality.