A Survey: Clustering Ensembles Techniques

The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.




References:
[1] A. Topchy, A. K. Jain and W. Punch, "Clustering ensembles: Models of
consensus and weak partitions," IEEE Transaction on Pattern Analysis and
Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[2] A. Topchy, A. K. Jain and W. Punch, "A mixture model for clustering
ensembles," Proceedings of the SIAM International Conference on Data
Mining, Michigan State University, USA, 2004.
[3] S. Dudoit and J. Fridlyand, "Bagging to improve the accuracy of a
clustering procedure," Bioinformatics oxford university, vol. 19, no. 9, pp.
1090-1099, Nov. 2003.
[4] A. L. N. Fred, "Finding consistent cluster in data partitions," Springer-
Verlag Berlin Heidelberg, MCS, pp. 309-318, 2001.
[5] A. L. N. Fred and A. K. Jain, "Data clustering using evidence
accumulation," IEEE Transactions on Pattern Analysis and Machine
Intelligence, pp. 835-850, 2002.
[6] B. Fischer and J. M. Buhmann, "Path-based clustering for grouping of
smooth curves and texture segmentation," IEEE Transaction on Pattern
Analysis and Machine Intelligence, vol. 25, no.4, Apr. 2003.
[7] Y. Qian and C. Suen, "Clustering combination method," Proceeding
International Conference Pattern Recognition, vol. 2, 2000.
[8] A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse
framework for combining multiple partitions," Journal of Machine Learning
Research, pp.583-617, Feb. 2002.
[9] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE
Transaction on Neural Networks, vol. 16, no. 3, May 2005.
[10] X. Z. Fern and C. E. Brodley, "Random Projection for high dimensional
data clustering: A cluster ensemble approach," Proceedings of the 20th
International Conference on Machine Learning (ICML), Washington DC.,
pp.186-193, 2003.
[11] W. Gablentz and M. Koppen, "Robust clustering by evolutionary
computation," Proceeding Fifth Online World Conference Soft Computing in
Industrial Applications (WSC5), 2000.
[12] P. Kellam, X. Liu, N. Martin, C. Orengo, S. Swift and A. Tucker,
"Comparing, contrasting and combining clusters in viral gene expression
data," Proceedings of 6th Workshop on Intelligent Data Analysis, 2001.
[13] Y. C. Chiou and L. W. Lan, "Genetic clustering algorithms," EJOR
European Journal of operational Research, vol. 135, pp. 413-427, Nov.
2001.
[14] A. K. Jain, M. N. Murty and P. Flynn, "Data clustering: A Review,"
ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sep. 1999.
[15] B. Fischer and J. M. Buhmann, "Bagging for path-based clustering,"
IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 25,
no.11, Nov. 2003.
[16] Y. Hong, S. Kwong, Y. Chang and Q. Ren, "Unsupervised feature
selection using clustering ensembles and population based incremental
learning algorithm," Pattern Recognition Society, vol. 41, no. 9, pp. 2742-
2756, Dec. 2008.
[17] J. Azimi, M. Mohammadi, A. Movaghar and M. Analoui, "Clustering
ensembles using genetic algorithm," IEEE The international Workshop on
computer Architecture for Machine perception and sensing, pp. 119-123,
Sep. 2006.
[18] A. Topchy, A. K. Jain and W. Punch, "Combining multiple weak
clusterings," Proceeding of the Third IEEE International Conference on
Data Mining, 2003.
[19] H. Luo, F. Jing and X. Xie, "Combining multiple clusterings using
information theory based genetic algorithm," IEEE International Conference
on Computational Intelligence and Security, vol. 1, pp. 84-89, 2006.
[20] J. Azimi, M. Abdoos and M. Analoui, "A new efficient approach in
clustering ensembles," IDEAL LNCS, vol. 4881, pp. 395-405, 2007.
[21] A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse
framework for combining partitionings," Proceeding of 11th National
Conference on Artificial Intelligence, Alberta, Canada ,pp. 93 98, 2002.
[22] A. Topchy, B. Minaei Bidgoli, A. K. Jain and W. Punch, "Adaptive
clustering ensembles," Proceeding International Conference on Pattern
Recognition (ICPR), pp. 272-275, Cambridge, UK, 2004.
[23] X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by
bipartite graph partitioning," Proceedings of the 21st International
Conference on Machine Learning, Canada, 2004.
[24] A. Ng, M. Jordan and Y. Weiss, "On spectral clustering: Analysis and an
algorithm," NIPS 14, 2002.
[25] G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for
partitioning irregular graphs," SIAM Journal on Scientific Computing, pp.
359-392, 1998.
[26] M. Analoui and N. Sadighian, "Solving cluster ensemble problems by
correlation-s matrix & GA," IFIP International Federation for Information
Processing, vol. 228, pp. 227-231, 2006.