Abstract: The clustering ensembles combine multiple partitions
generated by different clustering algorithms into a single clustering
solution. Clustering ensembles have emerged as a prominent method
for improving robustness, stability and accuracy of unsupervised
classification solutions. So far, many contributions have been done to
find consensus clustering. One of the major problems in clustering
ensembles is the consensus function. In this paper, firstly, we
introduce clustering ensembles, representation of multiple partitions,
its challenges and present taxonomy of combination algorithms.
Secondly, we describe consensus functions in clustering ensembles
including Hypergraph partitioning, Voting approach, Mutual
information, Co-association based functions and Finite mixture
model, and next explain their advantages, disadvantages and
computational complexity. Finally, we compare the characteristics of
clustering ensembles algorithms such as computational complexity,
robustness, simplicity and accuracy on different datasets in previous
techniques.
Abstract: Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.
Abstract: A wireless sensor network with a large number of tiny sensor nodes can be used as an effective tool for gathering data in various situations. One of the major issues in wireless sensor networks is developing an energy-efficient routing protocol which has a significant impact on the overall lifetime of the sensor network. In this paper, we propose a novel hierarchical with static clustering routing protocol called Energy-Efficient Protocol with Static Clustering (EEPSC). EEPSC, partitions the network into static clusters, eliminates the overhead of dynamic clustering and utilizes temporary-cluster-heads to distribute the energy load among high-power sensor nodes; thus extends network lifetime. We have conducted simulation-based evaluations to compare the performance of EEPSC against Low-Energy Adaptive Clustering Hierarchy (LEACH). Our experiment results show that EEPSC outperforms LEACH in terms of network lifetime and power consumption minimization.
Abstract: Tungsten trioxide has been prepared by using P-PTA
as a precursor on alumina substrates by spin coating method.
Palladium introduced on WO3 film via electrolysis deposition by
using palladium chloride as catalytic precursor. The catalytic
precursor was introduced on the series of films with different
morphologies. X-ray diffractometry (XRD), Scanning electron
microscopy (SEM) and XPS were applied to analyze structure and
morphology of the fabricated thin films. Then we measured variation
of samples- electrical conductivity of pure and Pd added films in air
and diluted hydrogen. Addition of Pd resulted in a remarkable
improvement of the hydrogen sensing properties of WO3 by detection
of Hydrogen below 1% at room temperature. Also variation of the
electrical conductivity in the presence of diluted hydrogen revealed
that response of samples depends rather strongly on the palladium
configuration on the surface.
Abstract: This paper proposes a new approach to offer a private
cloud service in HPC clusters. In particular, our approach relies on
automatically scheduling users- customized environment request as a
normal job in batch system. After finishing virtualization request jobs,
those guest operating systems will dismiss so that compute nodes will
be released again for computing. We present initial work on the
innovative integration of HPC batch system and virtualization tools
that aims at coexistence such that they suffice for meeting the
minimizing interference required by a traditional HPC cluster. Given
the design of initial infrastructure, the proposed effort has the potential
to positively impact on synergy model. The results from the
experiment concluded that goal for provisioning customized cluster
environment indeed can be fulfilled by using virtual machines, and
efficiency can be improved with proper setup and arrangements.
Abstract: In Multiple Sclerosis, pathological changes in the
brain results in deviations in signal intensity on Magnetic Resonance
Images (MRI). Quantitative analysis of these changes and their
correlation with clinical finding provides important information for
diagnosis. This constitutes the objective of our work. A new approach
is developed. After the enhancement of images contrast and the brain
extraction by mathematical morphology algorithm, we proceed to the
brain segmentation. Our approach is based on building statistical
model from data itself, for normal brain MRI and including clustering
tissue type. Then we detect signal abnormalities (MS lesions) as a
rejection class containing voxels that are not explained by the built
model. We validate the method on MR images of Multiple Sclerosis
patients by comparing its results with those of human expert
segmentation.
Abstract: Interpretation of aerial images is an important task in
various applications. Image segmentation can be viewed as the essential
step for extracting information from aerial images. Among many
developed segmentation methods, the technique of clustering has been
extensively investigated and used. However, determining the number
of clusters in an image is inherently a difficult problem, especially
when a priori information on the aerial image is unavailable. This
study proposes a support vector machine approach for clustering
aerial images. Three cluster validity indices, distance-based index,
Davies-Bouldin index, and Xie-Beni index, are utilized as quantitative
measures of the quality of clustering results. Comparisons on the
effectiveness of these indices and various parameters settings on the
proposed methods are conducted. Experimental results are provided
to illustrate the feasibility of the proposed approach.
Abstract: This paper investigates the spatial structure of employment in the Jakarta Metropolitan Area (JMA), with reference to the concept of the Southeast Asian extended metropolitan region (EMR). A combination of factor analysis and local Getis-Ord (Gi*) hot-spot analysis is used to identify clusters of employment in the region, including those of the urban and agriculture sectors. Spatial statistical analysis is further used to probe the spatial association of identified employment clusters with their surroundings on several dimensions, including the spatial association between the central business district (CBD) in Jakarta city on employment density in the region, the spatial impacts of urban expansion on population growth and the degree of urban-rural interaction. The degree of spatial interaction for the whole JMA is measured by the patterns of commuting trips destined to the various employment clusters. Results reveal the strong role of the urban core of Jakarta, and the regional CBD, as the centre for mixed job sectors such as retail, wholesale, services and finance. Manufacturing and local government services, on the other hand, form corridors radiating out of the urban core, reaching out to the agriculture zones in the fringes. Strong associations between the urban expansion corridors and population growth, and urban-rural mix, are revealed particularly in the eastern and western parts of JMA. Metropolitan wide commuting patterns are focussed on the urban core of Jakarta and the CBD, while relatively local commuting patterns are shown to be prevalent for the employment corridors.
Abstract: The complex hybrid and nonlinear nature of many processes that are met in practice causes problems with both structure modelling and parameter identification; therefore, obtaining a model that is suitable for MPC is often a difficult task. The basic idea of this paper is to present an identification method for a piecewise affine (PWA) model based on a fuzzy clustering algorithm. First we introduce the PWA model. Next, we tackle the identification method. We treat the fuzzy clustering algorithm, deal with the projections of the fuzzy clusters into the input space of the PWA model and explain the estimation of the parameters of the PWA model by means of a modified least-squares method. Furthermore, we verify the usability of the proposed identification approach on a hybrid nonlinear batch reactor example. The result suggest that the batch reactor can be efficiently identified and thus formulated as a PWA model, which can eventually be used for model predictive control purposes.
Abstract: In this paper, we present a comparative study between two computer vision systems for objects recognition and tracking, these algorithms describe two different approach based on regions constituted by a set of pixels which parameterized objects in shot sequences. For the image segmentation and objects detection, the FCM technique is used, the overlapping between cluster's distribution is minimized by the use of suitable color space (other that the RGB one). The first technique takes into account a priori probabilities governing the computation of various clusters to track objects. A Parzen kernel method is described and allows identifying the players in each frame, we also show the importance of standard deviation value research of the Gaussian probability density function. Region matching is carried out by an algorithm that operates on the Mahalanobis distance between region descriptors in two subsequent frames and uses singular value decomposition to compute a set of correspondences satisfying both the principle of proximity and the principle of exclusion.
Abstract: Documents clustering become an essential technology
with the popularity of the Internet. That also means that fast and
high-quality document clustering technique play core topics. Text
clustering or shortly clustering is about discovering semantically
related groups in an unstructured collection of documents. Clustering
has been very popular for a long time because it provides unique
ways of digesting and generalizing large amounts of information.
One of the issues of clustering is to extract proper feature (concept)
of a problem domain. The existing clustering technology mainly
focuses on term weight calculation. To achieve more accurate
document clustering, more informative features including concept
weight are important. Feature Selection is important for clustering
process because some of the irrelevant or redundant feature may
misguide the clustering results. To counteract this issue, the proposed
system presents the concept weight for text clustering system
developed based on a k-means algorithm in accordance with the
principles of ontology so that the important of words of a cluster can
be identified by the weight values. To a certain extent, it has resolved
the semantic problem in specific areas.
Abstract: The belief K-modes method (BKM) approach is a new
clustering technique handling uncertainty in the attribute values of
objects in both the cluster construction task and the classification one.
Like the standard version of this method, the BKM results depend on
the chosen initial modes. So, one selection method of initial modes
is developed, in this paper, aiming at improving the performances of
the BKM approach. Experiments with several sets of real data show
that by considered the developed selection initial modes method, the
clustering algorithm produces more accurate results.
Abstract: Clustering in high dimensional space is a difficult
problem which is recurrent in many fields of science and
engineering, e.g., bioinformatics, image processing, pattern
reorganization and data mining. In high dimensional space some of
the dimensions are likely to be irrelevant, thus hiding the possible
clustering. In very high dimensions it is common for all the objects in
a dataset to be nearly equidistant from each other, completely
masking the clusters. Hence, performance of the clustering algorithm
decreases.
In this paper, we propose an algorithmic framework which
combines the (reduct) concept of rough set theory with the k-means
algorithm to remove the irrelevant dimensions in a high dimensional
space and obtain appropriate clusters. Our experiment on test data
shows that this framework increases efficiency of the clustering
process and accuracy of the results.
Abstract: Selection of the best possible set of suppliers has a
significant impact on the overall profitability and success of any
business. For this reason, it is usually necessary to optimize all
business processes and to make use of cost-effective alternatives for
additional savings. This paper proposes a new efficient context-aware
supplier selection model that takes into account possible changes of
the environment while significantly reducing selection costs. The
proposed model is based on data clustering techniques while
inspiring certain principles of online algorithms for an optimally
selection of suppliers. Unlike common selection models which re-run
the selection algorithm from the scratch-line for any decision-making
sub-period on the whole environment, our model considers the
changes only and superimposes it to the previously defined best set
of suppliers to obtain a new best set of suppliers. Therefore, any recomputation
of unchanged elements of the environment is avoided
and selection costs are consequently reduced significantly. A
numerical evaluation confirms applicability of this model and proves
that it is a more optimal solution compared with common static
selection models in this field.
Abstract: Wireless Sensor Networks (WSNs) are wireless
networks consisting of number of tiny, low cost and low power
sensor nodes to monitor various physical phenomena like
temperature, pressure, vibration, landslide detection, presence of any
object, etc. The major limitation in these networks is the use of nonrechargeable
battery having limited power supply. The main cause of
energy consumption WSN is communication subsystem. This paper
presents an efficient grid formation/clustering strategy known as Grid
based level Clustering and Aggregation of Data (GCAD). The
proposed clustering strategy is simple and scalable that uses low duty
cycle approach to keep non-CH nodes into sleep mode thus reducing
energy consumption. Simulation results demonstrate that our
proposed GCAD protocol performs better in various performance
metrics.
Abstract: This paper introduces new algorithms (Fuzzy relative
of the CLARANS algorithm FCLARANS and Fuzzy c Medoids
based on randomized search FCMRANS) for fuzzy clustering of
relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd)
in which the within cluster dissimilarity of each cluster is minimized
in each iteration by recomputing new medoids given current
memberships, FCLARANS minimizes the same objective function
minimized by FCMdd by changing current medoids in such away
that that the sum of the within cluster dissimilarities is minimized.
Computing new medoids may be effected by noise because outliers
may join the computation of medoids while the choice of medoids in
FCLARANS is dictated by the location of a predominant fraction of
points inside a cluster and, therefore, it is less sensitive to the
presence of outliers. In FCMRANS the step of computing new
medoids in FCMdd is modified to be based on randomized search.
Furthermore, a new initialization procedure is developed that add
randomness to the initialization procedure used with FCMdd. Both
FCLARANS and FCMRANS are compared with the robust and
linearized version of fuzzy c-medoids (RFCMdd). Experimental
results with different samples of the Reuter-21578, Newsgroups
(20NG) and generated datasets with noise show that FCLARANS is
more robust than both RFCMdd and FCMRANS. Finally, both
FCMRANS and FCLARANS are more efficient and their outputs
are almost the same as that of RFCMdd in terms of classification
rate.
Abstract: In this research, the diabetes conditions of people (healthy, prediabete and diabete) were tried to be identified with noninvasive palm perspiration measurements. Data clusters gathered from 200 subjects were used (1.Individual Attributes Cluster and 2. Palm Perspiration Attributes Cluster). To decrase the dimensions of these data clusters, Principal Component Analysis Method was used. Data clusters, prepared in that way, were classified with Support Vector Machines. Classifications with highest success were 82% for Glucose parameters and 84% for HbA1c parametres.
Abstract: This paper presents a new approach for image
segmentation by applying Pillar-Kmeans algorithm. This
segmentation process includes a new mechanism for clustering the
elements of high-resolution images in order to improve precision and
reduce computation time. The system applies K-means clustering to
the image segmentation after optimized by Pillar Algorithm. The
Pillar algorithm considers the pillars- placement which should be
located as far as possible from each other to withstand against the
pressure distribution of a roof, as identical to the number of centroids
amongst the data distribution. This algorithm is able to optimize the
K-means clustering for image segmentation in aspects of precision
and computation time. It designates the initial centroids- positions
by calculating the accumulated distance metric between each data
point and all previous centroids, and then selects data points which
have the maximum distance as new initial centroids. This algorithm
distributes all initial centroids according to the maximum
accumulated distance metric. This paper evaluates the proposed
approach for image segmentation by comparing with K-means and
Gaussian Mixture Model algorithm and involving RGB, HSV, HSL
and CIELAB color spaces. The experimental results clarify the
effectiveness of our approach to improve the segmentation quality in
aspects of precision and computational time.
Abstract: The present study presents a new approach to automatic
data clustering and classification problems in large and complex
databases and, at the same time, derives specific types of explicit rules
describing each cluster. The method works well in both sparse and
dense multidimensional data spaces. The members of the data space
can be of the same nature or represent different classes. A number
of N-dimensional ellipsoids are used for enclosing the data clouds.
Due to the geometry of an ellipsoid and its free rotation in space
the detection of clusters becomes very efficient. The method is based
on genetic algorithms that are used for the optimization of location,
orientation and geometric characteristics of the hyper-ellipsoids. The
proposed approach can serve as a basis for the development of
general knowledge systems for discovering hidden knowledge and
unexpected patterns and rules in various large databases.
Abstract: The prediction of Software quality during development life cycle of software project helps the development organization to make efficient use of available resource to produce the product of highest quality. “Whether a module is faulty or not" approach can be used to predict quality of a software module. There are numbers of software quality prediction models described in the literature based upon genetic algorithms, artificial neural network and other data mining algorithms. One of the promising aspects for quality prediction is based on clustering techniques. Most quality prediction models that are based on clustering techniques make use of K-means, Mixture-of-Guassians, Self-Organizing Map, Neural Gas and fuzzy K-means algorithm for prediction. In all these techniques a predefined structure is required that is number of neurons or clusters should be known before we start clustering process. But in case of Growing Neural Gas there is no need of predetermining the quantity of neurons and the topology of the structure to be used and it starts with a minimal neurons structure that is incremented during training until it reaches a maximum number user defined limits for clusters. Hence, in this work we have used Growing Neural Gas as underlying cluster algorithm that produces the initial set of labeled cluster from training data set and thereafter this set of clusters is used to predict the quality of test data set of software modules. The best testing results shows 80% accuracy in evaluating the quality of software modules. Hence, the proposed technique can be used by programmers in evaluating the quality of modules during software development.