Abstract: An extensive amount of work has been done in data
clustering research under the unsupervised learning technique in Data
Mining during the past two decades. Moreover, several approaches
and methods have been emerged focusing on clustering diverse data
types, features of cluster models and similarity rates of clusters.
However, none of the single clustering algorithm exemplifies its best
nature in extracting efficient clusters. Consequently, in order to
rectify this issue, a new challenging technique called Cluster
Ensemble method was bloomed. This new approach tends to be the
alternative method for the cluster analysis problem. The main
objective of the Cluster Ensemble is to aggregate the diverse
clustering solutions in such a way to attain accuracy and also to
improve the eminence the individual clustering algorithms. Due to
the massive and rapid development of new methods in the globe of
data mining, it is highly mandatory to scrutinize a vital analysis of
existing techniques and the future novelty. This paper shows the
comparative analysis of different cluster ensemble methods along
with their methodologies and salient features. Henceforth this
unambiguous analysis will be very useful for the society of clustering
experts and also helps in deciding the most appropriate one to resolve
the problem in hand.
Abstract: We compare three categorical data clustering
algorithms with respect to the problem of classifying cultural data
related to the aesthetic judgment of comics artists. Such a
classification is very important in Comics Art theory since the
determination of any classes of similarities in such kind of data will
provide to art-historians very fruitful information of Comics Art-s
evolution. To establish this, we use a categorical data set and we
study it by employing three categorical data clustering algorithms.
The performances of these algorithms are compared each other,
while interpretations of the clustering results are also given.
Abstract: Wireless Sensor Network is Multi hop Self-configuring
Wireless Network consisting of sensor nodes. The deployment of
wireless sensor networks in many application areas, e.g., aggregation
services, requires self-organization of the network nodes into clusters.
Efficient way to enhance the lifetime of the system is to partition the
network into distinct clusters with a high energy node as cluster head.
The different methods of node clustering techniques have appeared in
the literature, and roughly fall into two families; those based on the
construction of a dominating set and those which are based solely on
energy considerations. Energy optimized cluster formation for a set
of randomly scattered wireless sensors is presented. Sensors within a
cluster are expected to be communicating with cluster head only. The
energy constraint and limited computing resources of the sensor nodes
present the major challenges in gathering the data. In this paper we
propose a framework to study how partially correlated data affect the
performance of clustering algorithms. The total energy consumption
and network lifetime can be analyzed by combining random geometry
techniques and rate distortion theory. We also present the relation
between compression distortion and data correlation.
Abstract: Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
Abstract: In blended learning environments, the Internet can be combined with other technologies. The aim of this research was to design, introduce and validate a model to support synchronous and asynchronous activities by managing content domains in an Adaptive Hypermedia System (AHS). The application is based on information recovery techniques, clustering algorithms and adaptation rules to adjust the user's model to contents and objects of study. This system was applied to blended learning in higher education. The research strategy used was the case study method. Empirical studies were carried out on courses at two universities to validate the model. The results of this research show that the model had a positive effect on the learning process. The students indicated that the synchronous and asynchronous scenario is a good option, as it involves a combination of work with the lecturer and the AHS. In addition, they gave positive ratings to the system and stated that the contents were adapted to each user profile.
Abstract: Wavelet neural networks (WNNs) have emerged as a vital alternative to the vastly studied multilayer perceptrons (MLPs) since its first implementation. In this paper, we applied various clustering algorithms, namely, K-means (KM), Fuzzy C-means (FCM), symmetry-based K-means (SBKM), symmetry-based Fuzzy C-means (SBFCM) and modified point symmetry-based K-means (MPKM) clustering algorithms in choosing the translation parameter of a WNN. These modified WNNs are further applied to the heterogeneous cancer classification using benchmark microarray data and were compared against the conventional WNN with random initialization method. Experimental results showed that a WNN classifier with the MPKM algorithm is more precise than the conventional WNN as well as the WNNs with other clustering algorithms.
Abstract: Our objective in this paper is to propose an approach
capable of clustering web messages. The clustering is carried out by
assigning, with a certain probability, texts written by the same web
user to the same cluster based on Stylometric features and using
fuzzy clustering algorithms. Focus in the present work is on
comparing the most popular algorithms in fuzzy clustering theory
namely, Fuzzy C-means, Possibilistic C-means and Fuzzy
Possibilistic C-Means.
Abstract: Microarray experiments are information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. For biologists, a key aim when analyzing microarray data is to group genes based on the temporal patterns of their expression levels. In this paper, we used an iterative clustering method to find temporal patterns of gene expression. We evaluated the performance of this method by applying it to real sporulation data and simulated data. The patterns obtained using the iterative clustering were found to be superior to those obtained using existing clustering algorithms.
Abstract: Although there have been many researches in cluster
analysis to consider on feature weights, little effort is made on sample
weights. Recently, Yu et al. (2011) considered a probability
distribution over a data set to represent its sample weights and then
proposed sample-weighted clustering algorithms. In this paper, we
give a sample-weighted version of generalized fuzzy clustering
regularization (GFCR), called the sample-weighted GFCR
(SW-GFCR). Some experiments are considered. These experimental
results and comparisons demonstrate that the proposed SW-GFCR is
more effective than the most clustering algorithms.
Abstract: In this paper we study the fuzzy c-mean clustering algorithm
combined with principal components method. Demonstratively
analysis indicate that the new clustering method is well rather than
some clustering algorithms. We also consider the validity of clustering
method.
Abstract: In this paper, a clustering algorithm named KHarmonic
means (KHM) was employed in the training of Radial
Basis Function Networks (RBFNs). KHM organized the data in
clusters and determined the centres of the basis function. The popular
clustering algorithms, namely K-means (KM) and Fuzzy c-means
(FCM), are highly dependent on the initial identification of elements
that represent the cluster well. In KHM, the problem can be avoided.
This leads to improvement in the classification performance when
compared to other clustering algorithms. A comparison of the
classification accuracy was performed between KM, FCM and KHM.
The classification performance is based on the benchmark data sets:
Iris Plant, Diabetes and Breast Cancer. RBFN training with the KHM
algorithm shows better accuracy in classification problem.