A Review and Comparative Analysis on Cluster Ensemble Methods

Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.

Framework and Characterization of Physical Internet

Over the last years, a new paradigm known as Physical Internet has been developed, and studied in logistics management. The purpose of this global and open system is to deal with logistics grand challenge by setting up an efficient and sustainable Logistics Web. The purpose of this paper is to review scientific articles dedicated to Physical Internet topic, and to provide a clustering strategy enabling to classify the literature on the Physical Internet, to follow its evolution, as well as to criticize it. The classification is based on three factors: Logistics Web, organization, and resources. Several papers about Physical Internet have been classified and analyzed along the Logistics Web, resources and organization views at a strategic, tactical and operational level, respectively. A developed cluster analysis shows which topics of the Physical Internet that are the less covered actually. Future researches are outlined for these topics.

An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia

A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.

Intellectual Capital Disclosure: Profiles of Spanish Public Universities

In the higher education setting, there is a current trend in society toward greater openness and transparency. The economic, social and political changes that have occurred in recent years in public sector universities (particularly the New Public Management, the Bologna Process and the emergence of the “third mission”) call for a wider disclosure of value created by universities to support fundraising activities, to ensure accountability in the use of public funds and the outcomes of research and teaching, as well as close relationships with industries and territories. The paper has two purposes: 1) to explore the intellectual capital (IC) disclosure in Spanish universities through their websites, and 2) to identify university profiles. This study applies a content analysis to analyze the institutional websites of Spanish public universities and a cluster analysis. The analysis reveals that Spanish universities’ website content usually relates to human capital, while structural and relational capitals are less widely disclosed. Our research identifies three behavioral profiles of Spanish universities with regard to the online disclosure of IC (universities more proactive, universities less proactive and universities adopt a middle position in this regard. The results can serve as encouragement to university managers to enhance online IC disclosure to meet the information needs of university stakeholders.

Multivariate Assessment of Mathematics Test Scores of Students in Qatar

Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.

A Design for Customer Preferences Model by Cluster Analysis of Geometric Features and Customer Preferences

In the design cycle, a main design task is to determine the external shape of the product. The external shape of a product is one of the key factors that can affect the customers’ preferences linking to the motivation to buy the product, especially in the case of a consumer electronic product such as a mobile phone. The relationship between the external shape and the customer preferences needs to be studied to enhance the customer’s purchase desire and action. In this research, a design for customer preferences model is developed for investigating the relationships between the external shape and the customer preferences of a product. In the first stage, the names of the geometric features are collected and evaluated from the data of the specified internet web pages using the developed text miner. The key geometric features can be determined if the number of occurrence on the web pages is relatively high. For each key geometric feature, the numerical values are explored using the text miner to collect the internet data from the web pages. In the second stage, a cluster analysis model is developed to evaluate the numerical values of the key geometric features to divide the external shapes into several groups. Several design suggestion cases can be proposed, for example, large model, mid-size model, and mini model, for designing a mobile phone. A customer preference index is developed by evaluating the numerical data of each of the key geometric features of the design suggestion cases. The design suggestion case with the top ranking of the customer preference index can be selected as the final design of the product. In this paper, an example product of a notebook computer is illustrated. It shows that the external shape of a product can be used to drive customer preferences. The presented design for customer preferences model is useful for determining a suitable external shape of the product to increase customer preferences.

A Construction Management Tool: Determining a Project Schedule Typical Behaviors Using Cluster Analysis

Delays in the construction industry are a global phenomenon. Many construction projects experience extensive delays exceeding the initially estimated completion time. The main purpose of this study is to identify construction projects typical behaviors in order to develop a prognosis and management tool. Being able to know a construction projects schedule tendency will enable evidence-based decision-making to allow resolutions to be made before delays occur. This study presents an innovative approach that uses Cluster Analysis Method to support predictions during Earned Value Analyses. A clustering analysis was used to predict future scheduling, Earned Value Management (EVM), and Earned Schedule (ES) principal Indexes behaviors in construction projects. The analysis was made using a database with 90 different construction projects. It was validated with additional data extracted from literature and with another 15 contrasting projects. For all projects, planned and executed schedules were collected and the EVM and ES principal indexes were calculated. A complete linkage classification method was used. In this way, the cluster analysis made considers that the distance (or similarity) between two clusters must be measured by its most disparate elements, i.e. that the distance is given by the maximum span among its components. Finally, through the use of EVM and ES Indexes and Tukey and Fisher Pairwise Comparisons, the statistical dissimilarity was verified and four clusters were obtained. It can be said that construction projects show an average delay of 35% of its planned completion time. Furthermore, four typical behaviors were found and for each of the obtained clusters, the interim milestones and the necessary rhythms of construction were identified. In general, detected typical behaviors are: (1) Projects that perform a 5% of work advance in the first two tenths and maintain a constant rhythm until completion (greater than 10% for each remaining tenth), being able to finish on the initially estimated time. (2) Projects that start with an adequate construction rate but suffer minor delays culminating with a total delay of almost 27% of the planned time. (3) Projects which start with a performance below the planned rate and end up with an average delay of 64%, and (4) projects that begin with a poor performance, suffer great delays and end up with an average delay of a 120% of the planned completion time. The obtained clusters compose a tool to identify the behavior of new construction projects by comparing their current work performance to the validated database, thus allowing the correction of initial estimations towards more accurate completion schedules.

Cluster Analysis of Customer Churn in Telecom Industry

The research examines the factors that affect customer churn (CC) in the Jordanian telecom industry. A total of 700 surveys were distributed. Cluster analysis revealed three main clusters. Results showed that CC and customer satisfaction (CS) were the key determinants in forming the three clusters. In two clusters, the center values of CC were high, indicating that the customers were loyal and SC was expensive and time- and energy-consuming. Still, the mobile service provider (MSP) should enhance its communication (COM), and value added services (VASs), as well as customer complaint management systems (CCMS). Finally, for the third cluster the center of the CC indicates a poor level of loyalty, which facilitates customers churn to another MSP. The results of this study provide valuable feedback for MSP decision makers regarding approaches to improving their performance and reducing CC.

Cluster Analysis of Retailers’ Benefits from Their Cooperation with Manufacturers: Business Models Perspective

A number of studies discussed the topic of benefits of retailers-manufacturers cooperation and coopetition. However, there are only few publications focused on the benefits of cooperation and coopetition between retailers and their suppliers of durable consumer goods; especially in the context of business model of cooperating partners. This paper aims to provide a clustering approach to segment retailers selling consumer durables according to the benefits they obtain from their cooperation with key manufacturers and differentiate the said retailers’ in term of the business models of cooperating partners. For the purpose of the study, a survey (with a CATI method) collected data on 603 consumer durables retailers present on the Polish market. Retailers are clustered both, with hierarchical and non-hierarchical methods. Five distinctive groups of consumer durables’ retailers are (based on the studied benefits) identified using the two-stage clustering approach. The clusters are then characterized with a set of exogenous variables, key of which are business models employed by the retailer and its partnering key manufacturer. The paper finds that the a combination of a medium sized retailer classified as an Integrator with a chiefly domestic capital and a manufacturer categorized as a Market Player will yield the highest benefits. On the other side of the spectrum is medium sized Distributor retailer with solely domestic capital – in this case, the business model of the cooperating manufactrer appears to be irreleveant. This paper is the one of the first empirical study using cluster analysis on primary data that defines the types of cooperation between consumer durables’ retailers and manufacturers – their key suppliers. The analysis integrates a perspective of both retailers’ and manufacturers’ business models and matches them with individual and joint benefits.

Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Genetic Diversity Based Population Study of Freshwater Mud Eel (Monopterus cuchia) in Bangladesh

As genetic diversity is most important for existing, breeding and production of any fish; this study was undertaken for investigating genetic diversity of freshwater mud eel, Monopterus cuchia at population level where three ecological populations such as flooded area of Sylhet (P1), open water of Moulvibazar (P2) and open water of Sunamganj (P3) districts of Bangladesh were considered. Four arbitrary RAPD primers (OPB-12, C0-4, B-03 and OPB-08) were screened and RAPD banding patterns were analyzed among the populations considering 15 individuals of each population. In total 174, 138 and 149 bands were detected in the populations of P1, P2 and P3 respectively; however, each primer revealed less number of bands in each population. 100% polymorphic loci were recorded in P2 and P3 whereas only one monomorphic locus was observed in P1, recorded 97.5% polymorphism. Different genetic parameters such as inter-individual pairwise similarity, genetic distance, Nei genetic similarity, linkage distances, cluster analysis and allelic information, etc. were considered for measuring genetic diversity. The average inter-individual pairwise similarity was recorded 2.98, 1.47 and 1.35 in P1, P2 and P3 respectively. Considering genetic distance analysis, the highest distance 1 was recorded in P2 and P3 and the lowest genetic distance 0.444 was found in P2. The average Nei genetic similarity was observed 0.19, 0.16 and 0.13 in P1, P2 and P3, respectively; however, the average linkage distance was recorded 24.92, 17.14 and 15.28 in P1, P3 and P2 respectively. Based on linkage distance, genetic clusters were generated in three populations where 6 clades and 7 clusters were found in P1, 3 clades and 5 clusters were observed in P2 and 4 clades and 7 clusters were detected in P3. In addition, allelic information was observed where the frequency of p and q alleles were observed 0.093 and 0.907 in P1, 0.076 and 0.924 in P2, 0.074 and 0.926 in P3 respectively. The average gene diversity was observed highest in P2 (0.132) followed by P3 (0.131) and P1 (0.121) respectively.

Authenticity of Lipid and Soluble Sugar Profiles of Various Oat Cultivars (Avena sativa)

The identification of lipid and soluble sugar components in flour samples of different cultivars belonging to common oat species (Avena sativa L.) was performed: spring oat, winter oat and hulless oat. Fatty acids were extracted from flour samples with n-hexane, and derivatized into volatile methyl esters, using TMSH (trimethylsulfonium hydroxide in methanol). Soluble sugars were then extracted from defatted and dried samples of oat flour with 96% ethanol, and further derivatized into corresponding TMS-oximes, using hydroxylamine hydrochloride solution and BSTFA (N,O-bis-(trimethylsilyl)-trifluoroacetamide). The hexane and ethanol extracts of each oat cultivar were analyzed using GC-MS system. Lipid and simple sugar compositions are very similar in all samples of investigated cultivars. Chemometric tool was applied to numeric values of automatically integrated surface areas of detected lipid and simple sugar components in their corresponding derivatized forms. Hierarchical cluster analysis shows a very high similarity between the investigated flour samples of oat cultivars, according to the fatty acid content (0.9955). Moderate similarity was observed according to the content of soluble sugars (0.50). These preliminary results support the idea of establishing methods for oat flour authentication, and provide the means for distinguishing oat flour samples, regardless of the variety, from flour samples made of other cereal species, just by lipid and simple sugar profile analysis.

The Role of Knowledge Management in Innovation: Spanish Evidence

In the knowledge-based economy, innovation is considered essential in order to achieve survival and growth in organizations. On the other hand, knowledge management is currently understood as one of the keys to innovation process. Both factors are generally admitted as generators of competitive advantage in organizations. Specifically, activities on R&D&I and those that generate internal knowledge have a positive influence in innovation results. This paper examines this effect and if it is similar or not is what we aimed to quantify in this paper. We focus on the impact that proportion of knowledge workers, the R&D&I investment, the amounts destined for ICTs and training for innovation have on the variation of tangible and intangibles returns for the sector of high and medium technology in Spain. To do this, we have performed an empirical analysis on the results of questionnaires about innovation in enterprises in Spain, collected by the National Statistics Institute. First, using clusters methodology, the behavior of these enterprises regarding knowledge management is identified. Then, using SEM methodology, we performed, for each cluster, the study about cause-effect relationships among constructs defined through variables, setting its type and quantification. The cluster analysis results in four groups in which cluster number 1 and 3 presents the best performance in innovation with differentiating nuances among them, while clusters 2 and 4 obtained divergent results to a similar innovative effort. However, the results of SEM analysis for each cluster show that, in all cases, knowledge workers are those that affect innovation performance most, regardless of the level of investment, and that there is a strong correlation between knowledge workers and investment in knowledge generation. The main findings reached is that Spanish high and medium technology companies improve their innovation performance investing in internal knowledge generation measures, specially, in terms of R&D activities, and underinvest in external ones. This, and the strong correlation between knowledge workers and the set of activities that promote the knowledge generation, should be taken into account by managers of companies, when making decisions about their investments for innovation, since they are key for improving their opportunities in the global market.

A Multivariate Statistical Approach for Water Quality Assessment of River Hindon, India

River Hindon is an important river catering the demand of highly populated rural and industrial cluster of western Uttar Pradesh, India. Water quality of river Hindon is deteriorating at an alarming rate due to various industrial, municipal and agricultural activities. The present study aimed at identifying the pollution sources and quantifying the degree to which these sources are responsible for the deteriorating water quality of the river. Various water quality parameters, like pH, temperature, electrical conductivity, total dissolved solids, total hardness, calcium, chloride, nitrate, sulphate, biological oxygen demand, chemical oxygen demand, and total alkalinity were assessed. Water quality data obtained from eight study sites for one year has been subjected to the two multivariate techniques, namely, principal component analysis and cluster analysis. Principal component analysis was applied with the aim to find out spatial variability and to identify the sources responsible for the water quality of the river. Three Varifactors were obtained after varimax rotation of initial principal components using principal component analysis. Cluster analysis was carried out to classify sampling stations of certain similarity, which grouped eight different sites into two clusters. The study reveals that the anthropogenic influence (municipal, industrial, waste water and agricultural runoff) was the major source of river water pollution. Thus, this study illustrates the utility of multivariate statistical techniques for analysis and elucidation of multifaceted data sets, recognition of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.

Customer Segmentation Model in E-commerce Using Clustering Techniques and LRFM Model: The Case of Online Stores in Morocco

Given the increase in the number of e-commerce sites, the number of competitors has become very important. This means that companies have to take appropriate decisions in order to meet the expectations of their customers and satisfy their needs. In this paper, we present a case study of applying LRFM (length, recency, frequency and monetary) model and clustering techniques in the sector of electronic commerce with a view to evaluating customers’ values of the Moroccan e-commerce websites and then developing effective marketing strategies. To achieve these objectives, we adopt LRFM model by applying a two-stage clustering method. In the first stage, the self-organizing maps method is used to determine the best number of clusters and the initial centroid. In the second stage, kmeans method is applied to segment 730 customers into nine clusters according to their L, R, F and M values. The results show that the cluster 6 is the most important cluster because the average values of L, R, F and M are higher than the overall average value. In addition, this study has considered another variable that describes the mode of payment used by customers to improve and strengthen clusters’ analysis. The clusters’ analysis demonstrates that the payment method is one of the key indicators of a new index which allows to assess the level of customers’ confidence in the company's Website.

Comparative Correlation Investigation of Polynuclear Aromatic Hydrocarbons (PAHs) in Soils of Different Land Use: Sources Evaluation Perspective

Polycyclic Aromatic Hydrocarbons (PAHs) are formed mainly because of incomplete combustion of organic materials during industrial, domestic activities or natural occurrence. Their toxicity and contamination of terrestrial and aquatic ecosystem have been established. However, with limited validity index, previous research has focused on PAHs isomer pair ratios of variable physicochemical properties in source identification. The objective of this investigation was to determine the empirical validity of Pearson Correlation Coefficient (PCC) and Cluster Analysis (CA) in PAHs source identification along soil samples of different land uses. Therefore, 16 PAHs grouped, as Endocrine Disruption Substances (EDSs) were determined in 10 sample stations in top and sub soils seasonally. PAHs was determined the use of Varian 300 gas chromatograph interfaced with flame ionization detector. Instruments and reagents used are of standard and chromatographic grades respectively. PCC and CA results showed that the classification of PAHs along pyrolitic and petrogenic organics used in source signature is about the predominance PAHs in environmental matrix. Therefore, the distribution of PAHs in the studied stations revealed the presence of trace quantities of the vast majority of the sixteen PAHs, which may ultimately inhabit the actual source signature authentication. Therefore, factors to be considered when evaluating possible sources of PAHs could be; type and extent of bacterial metabolism, transformation products/substrates, and environmental factors such as salinity, pH, oxygen concentration, nutrients, light intensity, temperature, co-substrates, and environmental medium are hereby recommended as factors to be considered when evaluating possible sources of PAHs.

Off-Line Detection of “Pannon Wheat” Milling Fractions by Near-Infrared Spectroscopic Methods

The aim of this investigation is to elaborate nearinfrared methods for testing and recognition of chemical components and quality in “Pannon wheat” allied (i.e. true to variety or variety identified) milling fractions as well as to develop spectroscopic methods following the milling processes and evaluate the stability of the milling technology by different types of milling products and according to sampling times, respectively. These wheat categories produced under industrial conditions where samples were collected versus sampling time and maximum or minimum yields. The changes of the main chemical components (such as starch, protein, lipid) and physical properties of fractions (particle size) were analysed by dispersive spectrophotometers using visible (VIS) and near-infrared (NIR) regions of the electromagnetic radiation. Close correlation were obtained between the data of spectroscopic measurement techniques processed by various chemometric methods (e.g. principal component analysis [PCA], cluster analysis [CA]) and operation condition of milling technology. It is obvious that NIR methods are able to detect the deviation of the yield parameters and differences of the sampling times by a wide variety of fractions, respectively. NIR technology can be used in the sensitive monitoring of milling technology.

A Study on the Relation among Primary Care Professionals Serving the Disadvantaged Community, Socioeconomic Status, and Adverse Health Outcome

During the post-Civil War era, the city of Nashville, Tennessee, had the highest mortality rate in the United States. The elevated death and disease rates among former slaves were attributable to lack of quality healthcare. To address the paucity of healthcare services, Meharry Medical College, an institution with the mission of educating minority professionals and serving the underserved population, was established in 1876. Purpose: The social ecological framework and partial least squares (PLS) path modeling were used to quantify the impact of socioeconomic status and adverse health outcome on primary care professionals serving the disadvantaged community. Thus, the study results could demonstrate the accomplishment of the College’s mission of training primary care professionals to serve in underserved areas. Methods: Various statistical methods were used to analyze alumni data from 1975 – 2013. K-means cluster analysis was utilized to identify individual medical and dental graduates in the cluster groups of the practice communities (Disadvantaged or Non-disadvantaged Communities). Discriminant analysis was implemented to verify the classification accuracy of cluster analysis. The independent t-test was performed to detect the significant mean differences of respective clustering and criterion variables. Chi-square test was used to test if the proportions of primary care and non-primary care specialists are consistent with those of medical and dental graduates practicing in the designated community clusters. Finally, the PLS path model was constructed to explore the construct validity of analytic model by providing the magnitude effects of socioeconomic status and adverse health outcome on primary care professionals serving the disadvantaged community. Results: Approximately 83% (3,192/3,864) of Meharry Medical College’s medical and dental graduates from 1975 to 2013 were practicing in disadvantaged communities. Independent t-test confirmed the content validity of the cluster analysis model. Also, the PLS path modeling demonstrated that alumni served as primary care professionals in communities with significantly lower socioeconomic status and higher adverse health outcome (p < .001). The PLS path modeling exhibited the meaningful interrelation between primary care professionals practicing communities and surrounding environments (socioeconomic statues and adverse health outcome), which yielded model reliability, validity, and applicability. Conclusion: This study applied social ecological theory and analytic modeling approaches to assess the attainment of Meharry Medical College’s mission of training primary care professionals to serve in underserved areas, particularly in communities with low socioeconomic status and high rates of adverse health outcomes. In summary, the majority of medical and dental graduates from Meharry Medical College provided primary care services to disadvantaged communities with low socioeconomic status and high adverse health outcome, which demonstrated that Meharry Medical College has fulfilled its mission. The high reliability, validity, and applicability of this model imply that it could be replicated for comparable universities and colleges elsewhere.

Various Advanced Statistical Analyses of Index Values Extracted from Outdoor Agricultural Workers Motion Data

We have been grouping and developing various kinds of practical, promising sensing applied systems concerning agricultural advancement and technical tradition (guidance). These include advanced devices to secure real-time data related to worker motion, and we analyze by methods of various advanced statistics and human dynamics (e.g. primary component analysis, Ward system based cluster analysis, and mapping). What is more, we have been considering worker daily health and safety issues. Targeted fields are mainly common farms, meadows, and gardens. After then, we observed and discussed time-line style, changing data. And, we made some suggestions. The entire plan makes it possible to improve both the aforementioned applied systems and farms.

A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.