Hierarchical Clustering Algorithms in Data Mining

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems as well as deriving more robust and scalable algorithms for clustering.




References:
[1] M. Brown, “Data mining techniques” Retrieved from
http://www.ibm.com/developerworks/library/ba-data-mining-techniques/
[2] S. Guha, R. Rastogi, and K. Shim, “ROCK: A robust clustering
algorithm for categorical attributes” Proceeding of 15th International
Conference on Data Engineering – ACM SIGKDD, pp. 512-521, 1999.
[3] M. Dutta, A.K. Mahanta, and A.K. Pujari, “QROCK: A quick version of
the ROCK algorithm for clustering of categorical data,” Pattern
Recognition Letters, 26 (15), pp. 2364-2373, 2005.
[4] L. Feng, M-H. Qiu, Y-X. Wang, Q-L. Xiang and K. Liu, "A fast divisive
clustering algorithm using an improved discrete particle swarm
optimizer, Pattern Recognition Letters, 31, pp. 1216-1225, 2010
[5] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data
clustering method for very large databases,” NewsLetter – ACMSIGMOD,
25 (2), pp. 103-114, 1996.
[6] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering
algorithm for large databases,” News Letter – ACM-SIGMOD, 7(2), pp.
73-84, 1998.
[7] G. Karypis, E-H Han, and V. Kumar, “CHAMELEON: A Hierarchical
Clustering Algorithm Using Dynamic Modeling,” IEEE Computer, 32
(8), 68-75, 1999.
[8] R.O. Duda and P.E. Hart, (1973). Pattern Classification and Scene
Analysis. A Wiley-Interscience Publication, New York.
[9] R.T. Ng and J. Han, "Efficient and effective clustering methods for
spartial data mining," Proceeding of the VLDB Conference, pp. 144-155,
1994.
[10] Y. Zhao and G. Karypis, “Evaluation of hierarchical clustering
algorithms for document datasets,” Proceedings of the 11th
International Conference on Information and Knowledge Management –
ACM, pp. 515-524, 2002.
[11] S. Salvador and P. Chan. “Determining the number of clusters/segments
in hierarchical clustering/segmentation algorithms,” Tools with Artificial
Intelligence - IEEE, pp. 576-584, 2004.
[12] H. Koga, T. Ishibashi, and T. Watanabe. “Fast agglomerative
hierarchical clustering algorithm using Locality-Sensitive Hashing,”
Knowledge and Information Systems, 12 (1), pp. 25-53, 2007. [13] V.S. Murthy, E, Vamsidhar, J.S. Kumar, and P.S Rao, “Content based
image retrieval using Hierarchical and K-means clustering techniques,”
International Journal of Engineering Science and Technology, 2 (3), pp.
209-212, 2010.
[14] S.J. Horng, M.Y. Su, Y.H. Chen, T.W. Kao, R.J. Chen, J.L. Lai, and
C.D. Perkasa, “A novel intrusion detection system based on hierarchical
clustering and support vector machines,” Expert Systems with
Applications, 38 (1), pp. 306-313, 2011.
[15] M.F. Balcan, Y. Liang, and P. Gupta, “Robust hierarchical clustering,”
Journal of Machine Learning Research, 15, pp. 3831-3871, 2014.
[16] S.M. Szilágyi, and L. Szilágyi, “A fast hierarchical clustering algorithm
for large-scale protein sequence data sets,” Computers in Biology and
Medicine, 48, pp. 94-101, 2014.
[17] R.T. Ng, and J. Han, “CLARANS: A Method for Clustering Objects for
Spatial Data Mining,” IEEE Transactions on Knowledge and Data
Engineering, 14 (5), pp. 1003-1016, 2005.
[18] Z. Huang, “Extensions to the k-means algorithm for clustering large data
sets with categorical values,” Data Mining and Knowledge Discovery, 2
(3), pp. 283-304, 1998.
[19] H. Huang, Y. Gao, K. Chiew, L. Chen, and Q. He, “Towards effective
and efficient mining of arbitrary shaped clusters,” Proceeding of 30th
International Conference on Data Engineering – IEEE, pp. 28-39, 2008
[20] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data
Clustering Method for Very Large Databases,” Proceedings of the 1996
ACM SIGMOD international conference on Management of data -
SIGMOD '96. pp. 103-114, 1996.
[21] H. Huang, Y. Gao, K. Chiew, K, L. Chen and Q. He, “Towards Effective
and Efficient Mining of Arbitrary Shaped Clusters,” IEEE 30th ICDE
Conference, pp. 28-39, 2014.
[22] P. Berkhin, “A survey of clustering data mining techniques,” Grouping
Multidimensional Data – Springer, pp. 25-71, 2006.
[23] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering
validation techniques,” Journal of Intelligent Information Systems, 17
(2-3), pp. 107-145, 2001.
[24] J. Meng, S-J. Gao, and Y. Huang, “Enrichment constrained timedependent
clustering analysis for finding meaningful temporal
transcription modules,” Bioinformatics, 25 (12), pp. 1521–1527, 2009.
[25] A.T. Ernst and M. Krishnamoorthy, “Solution algorithms for the
capacitated single allocation hub location problem,” Annals of
Operations Research, 86, pp. 141-159, 1999.
[26] M. Laan, and K. Pollard, "A new algorithm for hybrid hierarchical
clustering with visualization and the bootstrap," Journal of Statistical
Planning and Inference, 117 (2), p.275-303, Dec 2002.
[27] Y. Zhao, G. Karypis, and U. Fayyad, “Hierarchical Clustering
Algorithms for Document Datasets,” Journal Data Mining and
Knowledge Discovery archive, 10 (2), pp. 141-168, March 2005
[28] S.A. Mingoti, and J.O. Lima, “Comparing SOM neural network with
Fuzzy c-means, K-means and traditional hierarchical clustering
algorithms,” European Journal of Operational Research - Science
Direct. 174 (3), pp. 1742–17591, November 2006.
[29] A. Shepitsen, J. Gemmell, B. Mobasher, and R. Burke, “Personalized
recommendation in social tagging systems using hierarchical
clustering,” Proceedings of the 2008 ACM conference on Recommender
systems, pp. 259-266 (2008).
[30] H. Koga, T. Ishibashi, and T, Watanabe, “Fast agglomerative
hierarchical clustering algorithm using Locality-Sensitive Hashing,”
Knowledge and Information Systems,12 (1), pp. 25-53, May 2007
[31] O.A. Abbas, Comparisons between Data Clustering Algorithms, The
International Arab Journal of Information Technology, 5 (3), pp.320 –
325, 2008.
[32] G. Xin, W.H. Yang, and B. DeGang, “EEHCA: An energy-efficient
hierarchical clustering algorithm for wireless sensor networks,”
Information Technology Journal, 7 (2), pp. 245-252, 2008.
[33] A.K. Jain,, “Data clustering: 50 years beyond K-means,” Pattern
Recognition Letters - Science Direct, 31 (8), pp. 651–666, June 2010
[34] V.S. Murthy, E. Vamsidhar, J.S. Kumar, and P.S. Rao, “Content based
image retrieval using Hierarchical and K-means clustering techniques,”
International Journal of Engineering Science and Technology, 2 (3), pp.
209-212, 2010.
[35] Y. Cai, and Y. Sun, “ESPRIT-Tree: hierarchical clustering analysis of
millions of 16S rRNA pyrosequences in quasilinear computational
time”. Nucleic Acids Res, 2011.
[36] S.J. Horng, M.Y. Su, Y.H. Chen, T.W. Kao, R.J. Chen, J.L. Lai, and
C.D. Perkasa, “ A novel intrusion detection system based on
hierarchical clustering and support vector machines,” Exp. Sys. W. Appl.,
38, pp. 306-313, 2011.
[37] G. Kou, and C. Lou, “Multiple factor hierarchical clustering algorithm
for large scale web page and search engine click stream data,” Annals of
Operations Research, 197 (1), pp. 123-134, August 2012 .
[38] A. Krishnamurthy, S. Balakrishnan, M. Xu, and A. Singh, “Efficient
active algorithms for hierarchical clustering,” Proceedings of the 29th
International Conference on Machine Learning, pp. 887-894, 2012.
[39] P. Langfelder, and S. Horvath, “Fast R functions for robust correlations
and hierarchical clustering,” J Stat Softw., 46 (11), pp. 1-17, March
2012.
[40] Y., Malitsky, A. Sabharwal, H. Samulowitz, and M. Sellmann,
“Algorithm portfolios based on cost-sensitive hierarchical clustering,”
Proceedings of the 23rd international joint conference on Artificial
Intelligence, pp. 608-614, 2013.
[41] M. Meila, and D. Heckerman, “An experimental comparison of several
clustering and initialization methods,” Proceedings of the 14th
conference on Uncertainty in artificial intelligence, pp. 386-395, 1998
[42] D. Müllner, “Fastcluster: Fast hierarchical, agglomerative clustering
routines for R and Python,” Journal of Statistical Software, 53 (9), pp. 1-
18, 2013.
[43] M.F. Balcan, Y. Liang, and P. Gupta, “Robust hierarchical clustering”
arXiv preprint arXiv:1401.0247, 2014.
[44] F. Murtagh, and P. Legendre, “ Ward’s Hierarchical Agglomerative
Clustering Method: Which Algorithms Implement Ward’s Criterion?”,
Journal of Classification Archive, 31 (3), pp. 274.295, October 2014.
[45] S.M. Szilágyi, and L. Szilágyi, “A fast hierarchical clustering algorithm
for large-scale protein sequence data sets,” Comput. Biol. Med., 48, pp.
94–101 (2014).
[46] E. Rashedi, A. Mirzaei, and M. Rahmati, “An information theoretic
approach to hierarchical clustering combination,” Neurocomputing, 148,
pp. 487-497, 2015.
[47] K. Ding, C. Huo, Y. Xu, Z. Zhong, and C. Pan, “ Sparse hierarchal
clustering for VHR image change detection,” Geoscience and Remote
Sensing Letters, IEEE, 12 (3), pp. 577 – 581, 2015.