Categorical Clustering By Converting Associated Information

Scholarly

Volume:1, Issue: 11, 2007 Page No: 3577 - 3582

International Journal of Information, Control and Computer Sciences

ISSN: 2517-9942

917 Downloads

Abstract Full Text Download References Share Add to Favorites

DOI:10.5281/zenodo.1075769 BibTeX JSON

Categorical Clustering By Converting Associated Information

Lacking an inherent “natural" dissimilarity measure between objects in categorical dataset presents special difficulties in clustering analysis. However, each categorical attributes from a given dataset provides natural probability and information in the sense of Shannon. In this paper, we proposed a novel method which heuristically converts categorical attributes to numerical values by exploiting such associated information. We conduct an experimental study with real-life categorical dataset. The experiment demonstrates the effectiveness of our approach.

Authors:

Keywords:

References:

[1] C. C. Aggarwal, A human-computer interactive method for projected
clustering, IEEE Transactions on Knowledge and Data Engineering,
16(4), 448-460, 2004.
[2] M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. OPTICS:
Ordering points to identify the clustering structure. In Proc. 1999 ACMSIGMOD
Int. Conf. Management of Data (SIGMOD'99), pages 49{60,
Philadelphia, PA, June 1999.
[3] M.R. Anderberg, Cluster analysis for applications, Academic Press,
1973.
[4] D. Barbara, Y. Li, J. Couto, COOLCAT: An entropy-based algorithm for
categorical clustering. In: CIKM Conference. McLean, VA, 2002.
[5] C.L. Blake and C.J. Merz, UCI repository of machine learning
databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html
[6] D. Cristofor and D. A. Simovici, An information-theoretical approach to
clustering categorical databases using genetic algorithms. In Proceedings
of the Workshop on Clustering High-Dimensional Data and Its
Applications (SIAM ICDM), pages 37-46, Washington, 2002.
[7] Richard O. Duda and Peter E. Hard, Pattern classification and scene
analysi. A wiley-Interscience Publication, New York, 1973.
[8] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm
for discovering clusters in large spatial databases. In Proc. 1996 Int.
Conf. Knowledge Discovery and Data Mining (KDD'96), pages
226{231, Portland, Oregon, Aug. 1996.
[9] D. Fisher, Improving inference through conceptual clustering. In Proc.
1987 National Conference Artificial Intelligence (AAAI-87), pages 461-
465, Seattle, WA, July 1987.
[10] K.C. Gowda and E. Diday, Symbolic clustering using a new dissimilarity
measure. Pattern Recognition, 24(6): 567-578, 1991.
[11] V. Ganti, J. Gehrke, and R. Ramakrishnan. CACTUS: Clustering
categorical data using summaries. In ACM SIGKDD Int-l Conference on
Knowledge discovery in Databases, 1999.
[12] David Gibson, Jon Kleiberg, Prabhakar Raghavan: Clustering
categorical data: an approach based on dynamic systems". Proc. 1998
Int. Conf. On Very Large Databases, pp. 311-323, New York, August
1998.
[13] J.C. Gower, A general coefficient of similarity and some of its
properties. BioMetrics, 27: 857-874, 1971.
[14] Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, ROCK: A robust
clustering algorithm for categorical attributes. ICDE 1999: 512-521.
[15] A. Hinneburg and D. A. Keim. An efficient approach to clustering in
large multimedia databases with noise. In Proc. 1998 Int. Conf.
Knowledge Discovery and Data Mining (KDD'98), pages 58-65, New
York, NY, Aug. 1998.
[16] J. Han and M. Kamber, Data mining: concepts and techniques, Morgan
Kaufmann publishers, 2001.
[17] Z. Huang, Extensions to the k-means algorithm for clustering large data
sets with categorical values, Data Mining and Knowledge Discovery,
vol. 2, no. 3, pp 283-304, 1998.
[18] A.K. Jain and R.C. Dubes, Algorithms for clustering data, Rentice Hall,
1988.
[19] L. Kaufman and P.J. Rousseeuw, Finding groups in data - An
Introduction to Cluster Analysis in Knowledge, 1990.
[20] Lioyd. Learning square quantization in PCM. (published in IEEE Trans.
Information Theory), 28:128-137, 1982), Technical Report, Bell Labs,
1957.
[21] Tao Li, Sheng Ma, Mitsunori Ogihara, Entropy-based criterion in
categorical clustering. In Proceedings of The 2004, IEEE International
Conference on Machine Learning (ICML 2004), pages 536-543.
[22] J. MacQueen. Some methods for classi┬»cation and analysis of
multivariate observations. Proc. 5th Berkeley Symp. Math. Statist, Prob.,
1:281-297, 1967.
[23] R.S. Michalski and R.E. Stephen, Automated construction of
classification: conceptual clustering versus numerical taxonomy. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 5(4): 396-
410, 1983.
[24] J.R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, no.
1, pp. 81-106, 1986.
[25] J.R. Quinlan, C4.5: Programs for machine learning. Morgan Kaufmann,
1993.
[26] H. Ralambondrainy, A conceptual version of the k-means algorithm.
Pattern Recognition Letters, 16:1147-1157, 1995.
[27] Claude. E. Shannon, A mathematical theory of communication, Bell
System Technical Journal, vol.27, pp. 379-423 and 623-656, July and
October, 1948.
[28] M. Steinbach, G. Karypis, and V. Kumar, A comparison of document
clustering techniques, In KDD workshop on Text Mining, 2000.
[29] L. Talavera and J. Béjar, Intergrating declarative knowledge in
hierarchical clustering tasks. Proceedings of the International
Symposium on Intelligent Data Analysis, pp. 211-222, Amsterdam, The
Netherlands: Springer-Verlag, 1999.
[30] Y. Zhang, A. Fu, C. Cai, and P. Heng, Clustering categorical data, In
Proc. 2000 IEEE Int. Conf. Data Engineering, San Deigo, USA, March
2000.

Scholarly

International Journal of Information, Control and Computer Sciences

Archive

Last Issue

Commitee

Categorical Clustering By Converting Associated Information

Scholarly

International Journal of Information, Control and Computer Sciences

Archive

Last Issue

Commitee

Categorical Clustering By Converting Associated Information

Preview