ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset

Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.





References:
[1] Michael Steinbach, Levent Ertöz and Vipin Kumar, "The Challenges of
Clustering High Dimensional Data", (online). Available : http://wwwusers.
cs.umn.edu/~kumar/papers/high_dim_clustering_19.pdf
[2] R. Sibson. SLINK, "An optimally efficient algorithm for the single-link
cluster method", The Computer Journal, 16(1):30{34,1973.
[3] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based
algorithm for discovering clusters in large spatial databases with Noise",
In Proceedings of the 2nd ACM International Conference on
Knowledge Discovery and Data Mining (KDD), Portland, OR, 1996.
[4] J. Han and M. Kamber, "Data Mining: Concepts and Techniques",
Morgan Kaufman, 2001.
[5] R. Agrawal, J. Gehrke, D. Gunopulos, and. Raghavan, "Automatic
subspace clustering of high dimensional data for data mining
applications", In Proceedings of the SIGMOD Conference, Seattle, WA,
1998.
[6] C. H. Cheng, A. W.-C. Fu, and Y. Zhang, "Entropy-based subspace
clustering for mining numerical data", In Proceedings of the 5th ACM
International Conference on Knowledge Discovery and Data Mining
(SIGKDD), San Diego, CA, pages 84{93, 1999.
[7] S. Goil, H. Nagesh, and A. Choudhary, "MAFIA: Efficient and scalable
subspace clustering for very large data sets", Technical Report CPDCTR-
9906-010, Northwestern University, 1999.
[8] K. Kailing, H.P. Kriegel, and P. Kroger, "Density-connected subspace
clustering for high-dimensional data", In Proceedings of the 4th SIAM
International Conference on Data Mining (SDM), Orlando, FL, 2004.
[9] H.P. Kriegel, P. Kroger, M. Renz, and S. Wurst, "A generic framework
for efficient subspace clustering of high-dimensional data. In
Proceedings of the 5th International Conference on Data Mining
(ICDM), Houston, TX, 2005.
[10] C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M. Murali, "A Monte
Carlo algorithm for fast projective clustering. In Proceedings of the
SIGMOD Conference, Madison, WI, 2002.
[11] C. Bohm, K. Kailing, H.P. Kriegel, and P. Kroger, "Density connected
clustering with local subspace preferences", In Proceedings of the 4th
International Conference on Data Mining (ICDM), Brighton, U.K.,
2004.
[12] C. Baumgartner, Plant C, Railing K, Kriegel H. -P, Kroger P, "Subspace
Selection for Clustering High-Dimensional Data", In proceedings of 4th
IEEE Int. Conference on Data Mining (ICDM 04), PP 11-18, Brighton,
UK, 2004.
[13] Daxin Jiang, Chun Tang , Aidong Zhang: "Cluster Analysis for Gene
Expression Data: A Survey", IEEE Transactions on Knowledge and
Data Engineering, Issue Date : November 2004, pp. 1370-1386.
[14] Elke Achtert, Christian Bohm, Hans-Peter Kriegel, Peer Kroger, Ina
Muller-Gorman, Arthur Zimek, "Finding Hierarchies of Subspace
Clusters", In Proceedings of 10th European Conference on Principles
and Practice of Knowledge Discovery in Databases (PKDD), Berlin,
Germany, 2006.