Grid-based Supervised Clustering - GBSC

This paper presents a supervised clustering algorithm, namely Grid-Based Supervised Clustering (GBSC), which is able to identify clusters of any shapes and sizes without presuming any canonical form for data distribution. The GBSC needs no prespecified number of clusters, is insensitive to the order of the input data objects, and is capable of handling outliers. Built on the combination of grid-based clustering and density-based clustering, under the assistance of the downward closure property of density used in bottom-up subspace clustering, the GBSC can notably reduce its search space to avoid the memory confinement situation during its execution. On two-dimension synthetic datasets, the GBSC can identify clusters with different shapes and sizes correctly. The GBSC also outperforms other five supervised clustering algorithms when the experiments are performed on some UCI datasets.




References:
[1] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic
subspace slustering of high dimensional data for data mining
applications," In Proc. ACM SIGMOD International Conference on
Management of Data, Seattle, Washington, June 1998, pp. 94-105.
[2] J. S. Aguilar, R. Ruiz, J. C. Riquelme, and R. Giraldez, "SNN: A
supervised clustering algorithm," In Proc. 14th International Conference
on Industrial & Engineering Applications of Artificial Intelligence &
Expert Systems (IEA/AIE 2001)
[3] S. H. Al-Harbi, and V. J. Rayward-Smith, "Adaptive k-means for
supervised clustering," Applied Intelligence, Volume 24, Number 3, pp.
219-226(8), June 2006.
[4] T. Finley and T. Joachims, "Supervised clustering with support vector
machines," In Proc. International conference on Machine learning,
Bonn, Germany, August 07 - 11, 2005, pp. 217-224.
[5] A. Jirayusakul, "Supervised growing neural gas algorithm in clustering
analysis," Ph.D. dissertation, School of Applied Statistics, National
Institute of Development Administration, Thailand, 2007, unpublised.
[6] A. Jirayusakul, and S. Auwatanamongkol, "A supervised growing neural
gas algorithm for cluster analysis," International Journal of Hybrid
Intelligent Systems, Vol. 4, No.2, 2007.
[7] S. B. Kotsiantis and P. E. Pintelas, "Recent Advances in Clustering: A
Brief Survey," Transactions on Information Science and Applications,
2004,1(1):73-81.
[8] X. Li and N. Ye, "Grid-and Dummy-Cluster-Based Learning of Normal
and Intrusive Clusters of Computer Intrusion Detection," Journal of
Quality and Reliability Engineering International, Vol. 18, No. 3, pp.
231-242.
[9] X. Li and N. Ye, "A Supervised Clustering Algorithm for Computer
Intrusion Detection," Knowledge and Information Systems, Vol. 8,
No.4, pp. 498-509.
[10] X. Li and N. Ye, "A Supervised Clustering and Classification Algorithm
for Mining Data With Mixed Variables," IEEE Transactions on Systems,
Man, and Cybernetics-Part A, Vol. 36, No. 2, pp. 396-406.
[11] N. H. Park and W. S. Lee, "Statistical Grid-based Clustering over Data
Streams," SIGMODD Record, Vol.33, No.1, March 2004.
[12] L. Parsans, E. Haque, and H. Liu, "Subspace Clustering for High
Dimensional Data: A Review," ACM SIGKDD Explorations Newsletter,
6(1):90-105, June 2004.
[13] C. M. Procopiuc, 1997, "Clustering Problems and their Applications (a
Survey)," Available: http://www.cs.duke.edu/~magda.
[14] G. Sheikholeslami, S. Chatterjee, and A. Zhang, "WaveCluster: A Multi-
Resolution Clustering Approach for Very Large Spatial Databases," In
Proc. International Conference on Very Large Databases, New York
City, August 24-27, 1998.
[15] J. Sinkkonen, S. Kaski, and J. Nikkila, "Discriminative Clustering:
Optimal Contingency Tables by Learning Metrics," In Proc. European
Conference on Machine Learning (ECML-02), Springer-Veriag,
London, pp. 418-430.
[16] N. Slonim and N. Tishby, "Agglomerative Information Bottleneck," In
Proc. Neural Information Processing Systems, pp. 617-623, 1999.
[17] P-N. Tan, M. Steinbach, and V. Kumar, "Introduction to Data Mining.
Boston: Pearson Education, Inc., pp. 604-608.
[18] N. Tishby, F. C. Pereira, and W. Bialek, "The Information Bottleneck
Method," In Proceedings of the Allerton Conference on
Communication and Computation, 1999.
[19] Y. Qu and Z. Xu, "Supervised Clustering Analysis for Microarray Data
Based on Multivariate Gauussian Mixture," Bioinformatics. 20 (Aug) :
1275-1288.
[20] University of California at Irving, Machine Learning Repository.
Available: http:/www.ics.edu/~mlearn/MLRepository.html
[21] N. Ye and X. Li, "A Scalable Clustering Technique for Intrusion
Signature Recognition," In Proc. The 2001 IEEE Workshop on
Information Assurance and Security United States Military Academy,
West Point, NY, 5-6 June, 2001.
[22] N. Ye and X. Li, 2005, "Method for Classifying Data using Clustering
and Classification Algorithm Supervised," Available:
http://www.patentstorm.us/patents/6907436/fulltext.html.
[23] N. Zeidat, C. F. Eick, and Z. Zhao, "Supervised Clustering: Algorithms
and Applications," Available:
http://www2.cs.uh.edu/~ceick/kdd/ZEZ06.pdf.