Density Clustering Based On Radius of Data (DCBRD)

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, a density based clustering algorithm (DCBRD) is presented, relying on a knowledge acquired from the data by dividing the data space into overlapped regions. The proposed algorithm discovers arbitrary shaped clusters, requires no input parameters and uses the same definitions of DBSCAN algorithm. We performed an experimental evaluation of the effectiveness and efficiency of it, and compared this results with that of DBSCAN. The results of our experiments demonstrate that the proposed algorithm is significantly efficient in discovering clusters of arbitrary shape and size.





References:
[1] Beckmann N., Kriegel H.-P., Schneider R, and Seeger B. "The R*-
tree: An Efficient and Robust Access Method for Points and
Rectangles", Proc. ACM SIGMOD Int. Conf. on Management of
Data, Atlantic City, NJ, 1990, pp. 322-331.
[2] DEFAYS, D. An efficient algorithm for a complete link method.
The Computer Journal, vol. 20, 1977, pp. 364-366.
[3] Ester M., Kriegel H.-P., Sander J., Xu X.: "A Density-Based
Algorithm for Discovering Clusters in Large Spatial Databases
with Noise", Proc. 2nd Int. Conf. on Knowledge Discovery and
Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.
[4] Ester M., Kriegel H-P., Sander J., Xu X. "Clustering for Mining in
Large Spatial Databases", Special Issue on Data Mining, KIJournal,
ScienTec Publishing, Vol. 1, 1998, pp. 1-7.
[5] Fayyad U., Piatetsky G., Smyth P., Uthurusay R., "Advances in
knowledge discovery", AAAI press, Cambridge,1996.
[6] Gordon A. D. "A review of hierarchical classification", Journal of
the Royal statistical society. Series A, Vol.150, 1987, pp. 119-137.
[7] Guha S., Rastogi R., Shim K.: "CURE: An Efficient Clustering
Algorithms for Large Databases", Proc. ACM SIGMOD Int. Conf.
on Management of Data, Seattle, WA, 1998, pp. 73-84.
[8] HAN J., KAMBER M., and TUNG A. K. H. "Spatial clustering
methods in data mining: A survey". Taylor and Francis, 2001.
[9] Krznaric D. and Levcopoulos C. "Optimal algorithms for complete
linkage clustering in d dimensions". Theor. Comput. Sci., 286(1),
2002, pp. 139-149.
[10] Kriegel. H-P., Peer K., and Irina G., "Incremental OPTICS:
Efficient Computation of Updates in a Hierarchical Cluster
Ordering.", Proc. 5th Int. Conf. on Data Warehousing and
Knowledge Discovery (DaWaK'03), Prague, Czech Rep., 2003, pp.
224-233.
[11] Kaufman L., Rousseeuw P. J.: "Finding Groups in Data: An
Introduction to Cluster Analysis", John Wiley & Sons, 1990.
[12] MacQueen, J.: "Some Methods for Classification and Analysis of
Multivariate Observations", 5th Berkeley Symp. Math. Statist.
Prob., Vol. 1, 1967, pp. 281-297.
[13] Ng R. T., Han J.: "Efficient and Effective Clustering Methods for
Spatial Data Mining", Proc. 20th Int. Conf. On Very Large Data
Bases, Santiago, Chile, Morgan Kaufmann Publishers, San
Francisco, CA, 1994, pp. 144-155.
[14] Sibson R.: "SLINK: an optimally efficient algorithm for the singlelink
cluster method". The Computer Journal, Vol. 16, No. 1, 1973,
pp. 30-34.
[15] Shi-hong Y.,Ping L., Ji-dog G.and Shui-geng Z."A Statistical
Information-based clustering Approach in distance space", JZUS,
vol. 6A(1), 2005, pp. 71-78.
[16] Voorhees, E.M. "Implementing agglomerative hierarchical
clustering algorithms for use in document retrieval". Information
Processing and Management, 22, 6, 1986, pp. 465-476.
[17] Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data
Clustering Method for Very Large Databases". Proc. ACM
SIGMOD Int. Conf. on Management of Data, ACM Press, New
York, 1996, pp.103-114.