Abstract: Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
Abstract: This paper introduces new algorithms (Fuzzy relative
of the CLARANS algorithm FCLARANS and Fuzzy c Medoids
based on randomized search FCMRANS) for fuzzy clustering of
relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd)
in which the within cluster dissimilarity of each cluster is minimized
in each iteration by recomputing new medoids given current
memberships, FCLARANS minimizes the same objective function
minimized by FCMdd by changing current medoids in such away
that that the sum of the within cluster dissimilarities is minimized.
Computing new medoids may be effected by noise because outliers
may join the computation of medoids while the choice of medoids in
FCLARANS is dictated by the location of a predominant fraction of
points inside a cluster and, therefore, it is less sensitive to the
presence of outliers. In FCMRANS the step of computing new
medoids in FCMdd is modified to be based on randomized search.
Furthermore, a new initialization procedure is developed that add
randomness to the initialization procedure used with FCMdd. Both
FCLARANS and FCMRANS are compared with the robust and
linearized version of fuzzy c-medoids (RFCMdd). Experimental
results with different samples of the Reuter-21578, Newsgroups
(20NG) and generated datasets with noise show that FCLARANS is
more robust than both RFCMdd and FCMRANS. Finally, both
FCMRANS and FCLARANS are more efficient and their outputs
are almost the same as that of RFCMdd in terms of classification
rate.