The clustering ensembles combine multiple partitions
generated by different clustering algorithms into a single clustering
solution. Clustering ensembles have emerged as a prominent method
for improving robustness, stability and accuracy of unsupervised
classification solutions. So far, many contributions have been done to
find consensus clustering. One of the major problems in clustering
ensembles is the consensus function. In this paper, firstly, we
introduce clustering ensembles, representation of multiple partitions,
its challenges and present taxonomy of combination algorithms.
Secondly, we describe consensus functions in clustering ensembles
including Hypergraph partitioning, Voting approach, Mutual
information, Co-association based functions and Finite mixture
model, and next explain their advantages, disadvantages and
computational complexity. Finally, we compare the characteristics of
clustering ensembles algorithms such as computational complexity,
robustness, simplicity and accuracy on different datasets in previous
techniques.
[1] A. Topchy, A. K. Jain and W. Punch, "Clustering ensembles: Models of
consensus and weak partitions," IEEE Transaction on Pattern Analysis and
Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[2] A. Topchy, A. K. Jain and W. Punch, "A mixture model for clustering
ensembles," Proceedings of the SIAM International Conference on Data
Mining, Michigan State University, USA, 2004.
[3] S. Dudoit and J. Fridlyand, "Bagging to improve the accuracy of a
clustering procedure," Bioinformatics oxford university, vol. 19, no. 9, pp.
1090-1099, Nov. 2003.
[4] A. L. N. Fred, "Finding consistent cluster in data partitions," Springer-
Verlag Berlin Heidelberg, MCS, pp. 309-318, 2001.
[5] A. L. N. Fred and A. K. Jain, "Data clustering using evidence
accumulation," IEEE Transactions on Pattern Analysis and Machine
Intelligence, pp. 835-850, 2002.
[6] B. Fischer and J. M. Buhmann, "Path-based clustering for grouping of
smooth curves and texture segmentation," IEEE Transaction on Pattern
Analysis and Machine Intelligence, vol. 25, no.4, Apr. 2003.
[7] Y. Qian and C. Suen, "Clustering combination method," Proceeding
International Conference Pattern Recognition, vol. 2, 2000.
[8] A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse
framework for combining multiple partitions," Journal of Machine Learning
Research, pp.583-617, Feb. 2002.
[9] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE
Transaction on Neural Networks, vol. 16, no. 3, May 2005.
[10] X. Z. Fern and C. E. Brodley, "Random Projection for high dimensional
data clustering: A cluster ensemble approach," Proceedings of the 20th
International Conference on Machine Learning (ICML), Washington DC.,
pp.186-193, 2003.
[11] W. Gablentz and M. Koppen, "Robust clustering by evolutionary
computation," Proceeding Fifth Online World Conference Soft Computing in
Industrial Applications (WSC5), 2000.
[12] P. Kellam, X. Liu, N. Martin, C. Orengo, S. Swift and A. Tucker,
"Comparing, contrasting and combining clusters in viral gene expression
data," Proceedings of 6th Workshop on Intelligent Data Analysis, 2001.
[13] Y. C. Chiou and L. W. Lan, "Genetic clustering algorithms," EJOR
European Journal of operational Research, vol. 135, pp. 413-427, Nov.
2001.
[14] A. K. Jain, M. N. Murty and P. Flynn, "Data clustering: A Review,"
ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sep. 1999.
[15] B. Fischer and J. M. Buhmann, "Bagging for path-based clustering,"
IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 25,
no.11, Nov. 2003.
[16] Y. Hong, S. Kwong, Y. Chang and Q. Ren, "Unsupervised feature
selection using clustering ensembles and population based incremental
learning algorithm," Pattern Recognition Society, vol. 41, no. 9, pp. 2742-
2756, Dec. 2008.
[17] J. Azimi, M. Mohammadi, A. Movaghar and M. Analoui, "Clustering
ensembles using genetic algorithm," IEEE The international Workshop on
computer Architecture for Machine perception and sensing, pp. 119-123,
Sep. 2006.
[18] A. Topchy, A. K. Jain and W. Punch, "Combining multiple weak
clusterings," Proceeding of the Third IEEE International Conference on
Data Mining, 2003.
[19] H. Luo, F. Jing and X. Xie, "Combining multiple clusterings using
information theory based genetic algorithm," IEEE International Conference
on Computational Intelligence and Security, vol. 1, pp. 84-89, 2006.
[20] J. Azimi, M. Abdoos and M. Analoui, "A new efficient approach in
clustering ensembles," IDEAL LNCS, vol. 4881, pp. 395-405, 2007.
[21] A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse
framework for combining partitionings," Proceeding of 11th National
Conference on Artificial Intelligence, Alberta, Canada ,pp. 93 98, 2002.
[22] A. Topchy, B. Minaei Bidgoli, A. K. Jain and W. Punch, "Adaptive
clustering ensembles," Proceeding International Conference on Pattern
Recognition (ICPR), pp. 272-275, Cambridge, UK, 2004.
[23] X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by
bipartite graph partitioning," Proceedings of the 21st International
Conference on Machine Learning, Canada, 2004.
[24] A. Ng, M. Jordan and Y. Weiss, "On spectral clustering: Analysis and an
algorithm," NIPS 14, 2002.
[25] G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for
partitioning irregular graphs," SIAM Journal on Scientific Computing, pp.
359-392, 1998.
[26] M. Analoui and N. Sadighian, "Solving cluster ensemble problems by
correlation-s matrix & GA," IFIP International Federation for Information
Processing, vol. 228, pp. 227-231, 2006.
[1] A. Topchy, A. K. Jain and W. Punch, "Clustering ensembles: Models of
consensus and weak partitions," IEEE Transaction on Pattern Analysis and
Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[2] A. Topchy, A. K. Jain and W. Punch, "A mixture model for clustering
ensembles," Proceedings of the SIAM International Conference on Data
Mining, Michigan State University, USA, 2004.
[3] S. Dudoit and J. Fridlyand, "Bagging to improve the accuracy of a
clustering procedure," Bioinformatics oxford university, vol. 19, no. 9, pp.
1090-1099, Nov. 2003.
[4] A. L. N. Fred, "Finding consistent cluster in data partitions," Springer-
Verlag Berlin Heidelberg, MCS, pp. 309-318, 2001.
[5] A. L. N. Fred and A. K. Jain, "Data clustering using evidence
accumulation," IEEE Transactions on Pattern Analysis and Machine
Intelligence, pp. 835-850, 2002.
[6] B. Fischer and J. M. Buhmann, "Path-based clustering for grouping of
smooth curves and texture segmentation," IEEE Transaction on Pattern
Analysis and Machine Intelligence, vol. 25, no.4, Apr. 2003.
[7] Y. Qian and C. Suen, "Clustering combination method," Proceeding
International Conference Pattern Recognition, vol. 2, 2000.
[8] A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse
framework for combining multiple partitions," Journal of Machine Learning
Research, pp.583-617, Feb. 2002.
[9] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE
Transaction on Neural Networks, vol. 16, no. 3, May 2005.
[10] X. Z. Fern and C. E. Brodley, "Random Projection for high dimensional
data clustering: A cluster ensemble approach," Proceedings of the 20th
International Conference on Machine Learning (ICML), Washington DC.,
pp.186-193, 2003.
[11] W. Gablentz and M. Koppen, "Robust clustering by evolutionary
computation," Proceeding Fifth Online World Conference Soft Computing in
Industrial Applications (WSC5), 2000.
[12] P. Kellam, X. Liu, N. Martin, C. Orengo, S. Swift and A. Tucker,
"Comparing, contrasting and combining clusters in viral gene expression
data," Proceedings of 6th Workshop on Intelligent Data Analysis, 2001.
[13] Y. C. Chiou and L. W. Lan, "Genetic clustering algorithms," EJOR
European Journal of operational Research, vol. 135, pp. 413-427, Nov.
2001.
[14] A. K. Jain, M. N. Murty and P. Flynn, "Data clustering: A Review,"
ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sep. 1999.
[15] B. Fischer and J. M. Buhmann, "Bagging for path-based clustering,"
IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 25,
no.11, Nov. 2003.
[16] Y. Hong, S. Kwong, Y. Chang and Q. Ren, "Unsupervised feature
selection using clustering ensembles and population based incremental
learning algorithm," Pattern Recognition Society, vol. 41, no. 9, pp. 2742-
2756, Dec. 2008.
[17] J. Azimi, M. Mohammadi, A. Movaghar and M. Analoui, "Clustering
ensembles using genetic algorithm," IEEE The international Workshop on
computer Architecture for Machine perception and sensing, pp. 119-123,
Sep. 2006.
[18] A. Topchy, A. K. Jain and W. Punch, "Combining multiple weak
clusterings," Proceeding of the Third IEEE International Conference on
Data Mining, 2003.
[19] H. Luo, F. Jing and X. Xie, "Combining multiple clusterings using
information theory based genetic algorithm," IEEE International Conference
on Computational Intelligence and Security, vol. 1, pp. 84-89, 2006.
[20] J. Azimi, M. Abdoos and M. Analoui, "A new efficient approach in
clustering ensembles," IDEAL LNCS, vol. 4881, pp. 395-405, 2007.
[21] A. Strehl and J. Ghosh, "Cluster ensembles - A knowledge reuse
framework for combining partitionings," Proceeding of 11th National
Conference on Artificial Intelligence, Alberta, Canada ,pp. 93 98, 2002.
[22] A. Topchy, B. Minaei Bidgoli, A. K. Jain and W. Punch, "Adaptive
clustering ensembles," Proceeding International Conference on Pattern
Recognition (ICPR), pp. 272-275, Cambridge, UK, 2004.
[23] X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by
bipartite graph partitioning," Proceedings of the 21st International
Conference on Machine Learning, Canada, 2004.
[24] A. Ng, M. Jordan and Y. Weiss, "On spectral clustering: Analysis and an
algorithm," NIPS 14, 2002.
[25] G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for
partitioning irregular graphs," SIAM Journal on Scientific Computing, pp.
359-392, 1998.
[26] M. Analoui and N. Sadighian, "Solving cluster ensemble problems by
correlation-s matrix & GA," IFIP International Federation for Information
Processing, vol. 228, pp. 227-231, 2006.
@article{"International Journal of Information, Control and Computer Sciences:50045", author = "Reza Ghaemi and Md. Nasir Sulaiman and Hamidah Ibrahim and Norwati Mustapha", title = "A Survey: Clustering Ensembles Techniques", abstract = "The clustering ensembles combine multiple partitions
generated by different clustering algorithms into a single clustering
solution. Clustering ensembles have emerged as a prominent method
for improving robustness, stability and accuracy of unsupervised
classification solutions. So far, many contributions have been done to
find consensus clustering. One of the major problems in clustering
ensembles is the consensus function. In this paper, firstly, we
introduce clustering ensembles, representation of multiple partitions,
its challenges and present taxonomy of combination algorithms.
Secondly, we describe consensus functions in clustering ensembles
including Hypergraph partitioning, Voting approach, Mutual
information, Co-association based functions and Finite mixture
model, and next explain their advantages, disadvantages and
computational complexity. Finally, we compare the characteristics of
clustering ensembles algorithms such as computational complexity,
robustness, simplicity and accuracy on different datasets in previous
techniques.", keywords = "Clustering Ensembles, Combinational Algorithm,Consensus Function, Unsupervised Classification.", volume = "3", number = "2", pages = "264-10", }