An Energy-Efficient Distributed Unequal Clustering Protocol for Wireless Sensor Networks

The wireless sensor networks have been extensively deployed and researched. One of the major issues in wireless sensor networks is a developing energy-efficient clustering protocol. Clustering algorithm provides an effective way to prolong the lifetime of a wireless sensor networks. In the paper, we compare several clustering protocols which significantly affect a balancing of energy consumption. And we propose an Energy-Efficient Distributed Unequal Clustering (EEDUC) algorithm which provides a new way of creating distributed clusters. In EEDUC, each sensor node sets the waiting time. This waiting time is considered as a function of residual energy, number of neighborhood nodes. EEDUC uses waiting time to distribute cluster heads. We also propose an unequal clustering mechanism to solve the hot-spot problem. Simulation results show that EEDUC distributes the cluster heads, balances the energy consumption well among the cluster heads and increases the network lifetime.

Density Clustering Based On Radius of Data (DCBRD)

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, a density based clustering algorithm (DCBRD) is presented, relying on a knowledge acquired from the data by dividing the data space into overlapped regions. The proposed algorithm discovers arbitrary shaped clusters, requires no input parameters and uses the same definitions of DBSCAN algorithm. We performed an experimental evaluation of the effectiveness and efficiency of it, and compared this results with that of DBSCAN. The results of our experiments demonstrate that the proposed algorithm is significantly efficient in discovering clusters of arbitrary shape and size.

Sample-Weighted Fuzzy Clustering with Regularizations

Although there have been many researches in cluster analysis to consider on feature weights, little effort is made on sample weights. Recently, Yu et al. (2011) considered a probability distribution over a data set to represent its sample weights and then proposed sample-weighted clustering algorithms. In this paper, we give a sample-weighted version of generalized fuzzy clustering regularization (GFCR), called the sample-weighted GFCR (SW-GFCR). Some experiments are considered. These experimental results and comparisons demonstrate that the proposed SW-GFCR is more effective than the most clustering algorithms.

Validation Testing for Temporal Neural Networks for RBF Recognition

A neuron can emit spikes in an irregular time basis and by averaging over a certain time window one would ignore a lot of information. It is known that in the context of fast information processing there is no sufficient time to sample an average firing rate of the spiking neurons. The present work shows that the spiking neurons are capable of computing the radial basis functions by storing the relevant information in the neurons' delays. One of the fundamental findings of the this research also is that when using overlapping receptive fields to encode the data patterns it increases the network-s clustering capacity. The clustering algorithm that is discussed here is interesting from computer science and neuroscience point of view as well as from a perspective.

A Distributed Algorithm for Intrinsic Cluster Detection over Large Spatial Data

Clustering algorithms help to understand the hidden information present in datasets. A dataset may contain intrinsic and nested clusters, the detection of which is of utmost importance. This paper presents a Distributed Grid-based Density Clustering algorithm capable of identifying arbitrary shaped embedded clusters as well as multi-density clusters over large spatial datasets. For handling massive datasets, we implemented our method using a 'sharednothing' architecture where multiple computers are interconnected over a network. Experimental results are reported to establish the superiority of the technique in terms of scale-up, speedup as well as cluster quality.

MiSense Hierarchical Cluster-Based Routing Algorithm (MiCRA) for Wireless Sensor Networks

Wireless sensor networks (WSN) are currently receiving significant attention due to their unlimited potential. These networks are used for various applications, such as habitat monitoring, automation, agriculture, and security. The efficient nodeenergy utilization is one of important performance factors in wireless sensor networks because sensor nodes operate with limited battery power. In this paper, we proposed the MiSense hierarchical cluster based routing algorithm (MiCRA) to extend the lifetime of sensor networks and to maintain a balanced energy consumption of nodes. MiCRA is an extension of the HEED algorithm with two levels of cluster heads. The performance of the proposed protocol has been examined and evaluated through a simulation study. The simulation results clearly show that MiCRA has a better performance in terms of lifetime than HEED. Indeed, MiCRA our proposed protocol can effectively extend the network lifetime without other critical overheads and performance degradation. It has been noted that there is about 35% of energy saving for MiCRA during the clustering process and 65% energy savings during the routing process compared to the HEED algorithm.

Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance

In this paper, we propose an algorithm to compute initial cluster centers for K-means clustering. Data in a cell is partitioned using a cutting plane that divides cell in two smaller cells. The plane is perpendicular to the data axis with the highest variance and is designed to reduce the sum squared errors of the two cells as much as possible, while at the same time keep the two cells far apart as possible. Cells are partitioned one at a time until the number of cells equals to the predefined number of clusters, K. The centers of the K cells become the initial cluster centers for K-means. The experimental results suggest that the proposed algorithm is effective, converge to better clustering results than those of the random initialization method. The research also indicated the proposed algorithm would greatly improve the likelihood of every cluster containing some data in it.

Neural Networks Learning Improvement using the K-Means Clustering Algorithm to Detect Network Intrusions

In the present work, we propose a new technique to enhance the learning capabilities and reduce the computation intensity of a competitive learning multi-layered neural network using the K-means clustering algorithm. The proposed model use multi-layered network architecture with a back propagation learning mechanism. The K-means algorithm is first applied to the training dataset to reduce the amount of samples to be presented to the neural network, by automatically selecting an optimal set of samples. The obtained results demonstrate that the proposed technique performs exceptionally in terms of both accuracy and computation time when applied to the KDD99 dataset compared to a standard learning schema that use the full dataset.

Fuzzy Control of the Air Conditioning System at Different Operating Pressures

The present work demonstrates the design and simulation of a fuzzy control of an air conditioning system at different pressures. The first order Sugeno fuzzy inference system is utilized to model the system and create the controller. In addition, an estimation of the heat transfer rate and water mass flow rate injection into or withdraw from the air conditioning system is determined by the fuzzy IF-THEN rules. The approach starts by generating the input/output data. Then, the subtractive clustering algorithm along with least square estimation (LSE) generates the fuzzy rules that describe the relationship between input/output data. The fuzzy rules are tuned by Adaptive Neuro-Fuzzy Inference System (ANFIS). The results show that when the pressure increases the amount of water flow rate and heat transfer rate decrease within the lower ranges of inlet dry bulb temperatures. On the other hand, and as pressure increases the amount of water flow rate and heat transfer rate increases within the higher ranges of inlet dry bulb temperatures. The inflection in the pressure effect trend occurs at lower temperatures as the inlet air humidity increases.

Folksonomy-based Recommender Systems with User-s Recent Preferences

Social bookmarking is an environment in which the user gradually changes interests over time so that the tag data associated with the current temporal period is usually more important than tag data temporally far from the current period. This implies that in the social tagging system, the newly tagged items by the user are more relevant than older items. This study proposes a novel recommender system that considers the users- recent tag preferences. The proposed system includes the following stages: grouping similar users into clusters using an E-M clustering algorithm, finding similar resources based on the user-s bookmarks, and recommending the top-N items to the target user. The study examines the system-s information retrieval performance using a dataset from del.icio.us, which is a famous social bookmarking web site. Experimental results show that the proposed system is better and more effective than traditional approaches.

Performance Comparison of Particle Swarm Optimization with Traditional Clustering Algorithms used in Self-Organizing Map

Self-organizing map (SOM) is a well known data reduction technique used in data mining. It can reveal structure in data sets through data visualization that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOM, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of an adaptive heuristic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOM. The application of our method to several standard data sets demonstrates its feasibility. PSO algorithm utilizes a so-called U-matrix of SOM to determine cluster boundaries; the results of this novel automatic method compare very favorably to boundary detection through traditional algorithms namely k-means and hierarchical based approach which are normally used to interpret the output of SOM.

On the Noise Distance in Robust Fuzzy C-Means

In the last decades, a number of robust fuzzy clustering algorithms have been proposed to partition data sets affected by noise and outliers. Robust fuzzy C-means (robust-FCM) is certainly one of the most known among these algorithms. In robust-FCM, noise is modeled as a separate cluster and is characterized by a prototype that has a constant distance δ from all data points. Distance δ determines the boundary of the noise cluster and therefore is a critical parameter of the algorithm. Though some approaches have been proposed to automatically determine the most suitable δ for the specific application, up to today an efficient and fully satisfactory solution does not exist. The aim of this paper is to propose a novel method to compute the optimal δ based on the analysis of the distribution of the percentage of objects assigned to the noise cluster in repeated executions of the robust-FCM with decreasing values of δ . The extremely encouraging results obtained on some data sets found in the literature are shown and discussed.

Similarity Measures and Weighted Fuzzy C-Mean Clustering Algorithm

In this paper we study the fuzzy c-mean clustering algorithm combined with principal components method. Demonstratively analysis indicate that the new clustering method is well rather than some clustering algorithms. We also consider the validity of clustering method.

Identification of Nonlinear Systems Using Radial Basis Function Neural Network

This paper uses the radial basis function neural network (RBFNN) for system identification of nonlinear systems. Five nonlinear systems are used to examine the activity of RBFNN in system modeling of nonlinear systems; the five nonlinear systems are dual tank system, single tank system, DC motor system, and two academic models. The feed forward method is considered in this work for modelling the non-linear dynamic models, where the KMeans clustering algorithm used in this paper to select the centers of radial basis function network, because it is reliable, offers fast convergence and can handle large data sets. The least mean square method is used to adjust the weights to the output layer, and Euclidean distance method used to measure the width of the Gaussian function.

A New Method in Detection of Ceramic Tiles Color Defects Using Genetic C-Means Algorithm

In this paper an algorithm is used to detect the color defects of ceramic tiles. First the image of a normal tile is clustered using GCMA; Genetic C-means Clustering Algorithm; those results in best cluster centers. C-means is a common clustering algorithm which optimizes an objective function, based on a measure between data points and the cluster centers in the data space. Here the objective function describes the mean square error. After finding the best centers, each pixel of the image is assigned to the cluster with closest cluster center. Then, the maximum errors of clusters are computed. For each cluster, max error is the maximum distance between its center and all the pixels which belong to it. After computing errors all the pixels of defected tile image are clustered based on the centers obtained from normal tile image in previous stage. Pixels which their distance from their cluster center is more than the maximum error of that cluster are considered as defected pixels.

Analyzing The Effect of Variable Round Time for Clustering Approach in Wireless Sensor Networks

As wireless sensor networks are energy constraint networks so energy efficiency of sensor nodes is the main design issue. Clustering of nodes is an energy efficient approach. It prolongs the lifetime of wireless sensor networks by avoiding long distance communication. Clustering algorithms operate in rounds. Performance of clustering algorithm depends upon the round time. A large round time consumes more energy of cluster heads while a small round time causes frequent re-clustering. So existing clustering algorithms apply a trade off to round time and calculate it from the initial parameters of networks. But it is not appropriate to use initial parameters based round time value throughout the network lifetime because wireless sensor networks are dynamic in nature (nodes can be added to the network or some nodes go out of energy). In this paper a variable round time approach is proposed that calculates round time depending upon the number of active nodes remaining in the field. The proposed approach makes the clustering algorithm adaptive to network dynamics. For simulation the approach is implemented with LEACH in NS-2 and the results show that there is 6% increase in network lifetime, 7% increase in 50% node death time and 5% improvement over the data units gathered at the base station.

Improving RBF Networks Classification Performance by using K-Harmonic Means

In this paper, a clustering algorithm named KHarmonic means (KHM) was employed in the training of Radial Basis Function Networks (RBFNs). KHM organized the data in clusters and determined the centres of the basis function. The popular clustering algorithms, namely K-means (KM) and Fuzzy c-means (FCM), are highly dependent on the initial identification of elements that represent the cluster well. In KHM, the problem can be avoided. This leads to improvement in the classification performance when compared to other clustering algorithms. A comparison of the classification accuracy was performed between KM, FCM and KHM. The classification performance is based on the benchmark data sets: Iris Plant, Diabetes and Breast Cancer. RBFN training with the KHM algorithm shows better accuracy in classification problem.

Binary Classification Tree with Tuned Observation-based Clustering

There are several approaches for handling multiclass classification. Aside from one-against-one (OAO) and one-against-all (OAA), hierarchical classification technique is also commonly used. A binary classification tree is a hierarchical classification structure that breaks down a k-class problem into binary sub-problems, each solved by a binary classifier. In each node, a set of classes is divided into two subsets. A good class partition should be able to group similar classes together. Many algorithms measure similarity in term of distance between class centroids. Classes are grouped together by a clustering algorithm when distances between their centroids are small. In this paper, we present a binary classification tree with tuned observation-based clustering (BCT-TOB) that finds a class partition by performing clustering on observations instead of class centroids. A merging step is introduced to merge any insignificant class split. The experiment shows that performance of BCT-TOB is comparable to other algorithms.

A Text Clustering System based on k-means Type Subspace Clustering and Ontology

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Minimal Spanning Tree based Fuzzy Clustering

Most of fuzzy clustering algorithms have some discrepancies, e.g. they are not able to detect clusters with convex shapes, the number of the clusters should be a priori known, they suffer from numerical problems, like sensitiveness to the initialization, etc. This paper studies the synergistic combination of the hierarchical and graph theoretic minimal spanning tree based clustering algorithm with the partitional Gath-Geva fuzzy clustering algorithm. The aim of this hybridization is to increase the robustness and consistency of the clustering results and to decrease the number of the heuristically defined parameters of these algorithms to decrease the influence of the user on the clustering results. For the analysis of the resulted fuzzy clusters a new fuzzy similarity measure based tool has been presented. The calculated similarities of the clusters can be used for the hierarchical clustering of the resulted fuzzy clusters, which information is useful for cluster merging and for the visualization of the clustering results. As the examples used for the illustration of the operation of the new algorithm will show, the proposed algorithm can detect clusters from data with arbitrary shape and does not suffer from the numerical problems of the classical Gath-Geva fuzzy clustering algorithm.