Energy Efficient Clustering Algorithm with Global and Local Re-clustering for Wireless Sensor Networks

Wireless Sensor Networks consist of inexpensive, low power sensor nodes deployed to monitor the environment and collect data. Gathering information in an energy efficient manner is a critical aspect to prolong the network lifetime. Clustering  algorithms have an advantage of enhancing the network lifetime. Current clustering algorithms usually focus on global re-clustering and local re-clustering separately. This paper, proposed a combination of those two reclustering methods to reduce the energy consumption of the network. Furthermore, the proposed algorithm can apply to homogeneous as well as heterogeneous wireless sensor networks. In addition, the cluster head rotation happens, only when its energy drops below a dynamic threshold value computed by the algorithm. The simulation result shows that the proposed algorithm prolong the network lifetime compared to existing algorithms.

Layout Based Spam Filtering

Due to the constant increase in the volume of information available to applications in fields varying from medical diagnosis to web search engines, accurate support of similarity becomes an important task. This is also the case of spam filtering techniques where the similarities between the known and incoming messages are the fundaments of making the spam/not spam decision. We present a novel approach to filtering based solely on layout, whose goal is not only to correctly identify spam, but also warn about major emerging threats. We propose a mathematical formulation of the email message layout and based on it we elaborate an algorithm to separate different types of emails and find the new, numerically relevant spam types.

A New Approach In Protein Folding Studies Revealed The Potential Site For Nucleation Center

A new approach to predict the 3D structures of proteins by combining the knowledge-based method and Molecular Dynamics Simulation is presented on the chicken villin headpiece subdomain (HP-36). Comparative modeling is employed as the knowledge-based method to predict the core region (Ala9-Asn28) of the protein while the remaining residues are built as extended regions (Met1-Lys8; Leu29-Phe36) which then further refined using Molecular Dynamics Simulation for 120 ns. Since the core region is built based on a high sequence identity to the template (65%) resulting in RMSD of 1.39 Å from the native, it is believed that this well-developed core region can act as a 'nucleation center' for subsequent rapid downhill folding. Results also demonstrate that the formation of the non-native contact which tends to hamper folding rate can be avoided. The best 3D model that exhibits most of the native characteristics is identified using clustering method which then further ranked based on the conformational free energies. It is found that the backbone RMSD of the best model compared to the NMR-MDavg is 1.01 Å and 3.53 Å, for the core region and the complete protein, respectively. In addition to this, the conformational free energy of the best model is lower by 5.85 kcal/mol as compared to the NMR-MDavg. This structure prediction protocol is shown to be effective in predicting the 3D structure of small globular protein with a considerable accuracy in much shorter time compared to the conventional Molecular Dynamics simulation alone.

Secure Data Aggregation Using Clusters in Sensor Networks

Wireless sensor network can be applied to both abominable and military environments. A primary goal in the design of wireless sensor networks is lifetime maximization, constrained by the energy capacity of batteries. One well-known method to reduce energy consumption in such networks is data aggregation. Providing efcient data aggregation while preserving data privacy is a challenging problem in wireless sensor networks research. In this paper, we present privacy-preserving data aggregation scheme for additive aggregation functions. The Cluster-based Private Data Aggregation (CPDA)leverages clustering protocol and algebraic properties of polynomials. It has the advantage of incurring less communication overhead. The goal of our work is to bridge the gap between collaborative data collection by wireless sensor networks and data privacy. We present simulation results of our schemes and compare their performance to a typical data aggregation scheme TAG, where no data privacy protection is provided. Results show the efficacy and efficiency of our schemes.

Cluster Analysis for the Statistical Modeling of Aesthetic Judgment Data Related to Comics Artists

We compare three categorical data clustering algorithms with respect to the problem of classifying cultural data related to the aesthetic judgment of comics artists. Such a classification is very important in Comics Art theory since the determination of any classes of similarities in such kind of data will provide to art-historians very fruitful information of Comics Art-s evolution. To establish this, we use a categorical data set and we study it by employing three categorical data clustering algorithms. The performances of these algorithms are compared each other, while interpretations of the clustering results are also given.

Improved C-Fuzzy Decision Tree for Intrusion Detection

As the number of networked computers grows, intrusion detection is an essential component in keeping networks secure. Various approaches for intrusion detection are currently being in use with each one has its own merits and demerits. This paper presents our work to test and improve the performance of a new class of decision tree c-fuzzy decision tree to detect intrusion. The work also includes identifying best candidate feature sub set to build the efficient c-fuzzy decision tree based Intrusion Detection System (IDS). We investigated the usefulness of c-fuzzy decision tree for developing IDS with a data partition based on horizontal fragmentation. Empirical results indicate the usefulness of our approach in developing the efficient IDS.

A Constrained Clustering Algorithm for the Classification of Industrial Ores

In this paper a Pattern Recognition algorithm based on a constrained version of the k-means clustering algorithm will be presented. The proposed algorithm is a non parametric supervised statistical pattern recognition algorithm, i.e. it works under very mild assumptions on the dataset. The performance of the algorithm will be tested, togheter with a feature extraction technique that captures the information on the closed two-dimensional contour of an image, on images of industrial mineral ores.

Clustering Multivariate Empiric Characteristic Functions for Multi-Class SVM Classification

A dissimilarity measure between the empiric characteristic functions of the subsamples associated to the different classes in a multivariate data set is proposed. This measure can be efficiently computed, and it depends on all the cases of each class. It may be used to find groups of similar classes, which could be joined for further analysis, or it could be employed to perform an agglomerative hierarchical cluster analysis of the set of classes. The final tree can serve to build a family of binary classification models, offering an alternative approach to the multi-class SVM problem. We have tested this dendrogram based SVM approach with the oneagainst- one SVM approach over four publicly available data sets, three of them being microarray data. Both performances have been found equivalent, but the first solution requires a smaller number of binary SVM models.

Objective Assessment of Psoriasis Lesion Thickness for PASI Scoring using 3D Digital Imaging

Psoriasis is a chronic inflammatory skin condition which affects 2-3% of population around the world. Psoriasis Area and Severity Index (PASI) is a gold standard to assess psoriasis severity as well as the treatment efficacy. Although a gold standard, PASI is rarely used because it is tedious and complex. In practice, PASI score is determined subjectively by dermatologists, therefore inter and intra variations of assessment are possible to happen even among expert dermatologists. This research develops an algorithm to assess psoriasis lesion for PASI scoring objectively. Focus of this research is thickness assessment as one of PASI four parameters beside area, erythema and scaliness. Psoriasis lesion thickness is measured by averaging the total elevation from lesion base to lesion surface. Thickness values of 122 3D images taken from 39 patients are grouped into 4 PASI thickness score using K-means clustering. Validation on lesion base construction is performed using twelve body curvature models and show good result with coefficient of determinant (R2) is equal to 1.

Clustering Methods Applied to the Tracking of user Traces Interacting with an e-Learning System

Many research works are carried out on the analysis of traces in a digital learning environment. These studies produce large volumes of usage tracks from the various actions performed by a user. However, to exploit these data, compare and improve performance, several issues are raised. To remedy this, several works deal with this problem seen recently. This research studied a series of questions about format and description of the data to be shared. Our goal is to share thoughts on these issues by presenting our experience in the analysis of trace-based log files, comparing several approaches used in automatic classification applied to e-learning platforms. Finally, the obtained results are discussed.

A Strategy to Optimize the SPC Scheme for Mass Production of HDD Arm with ClusteringTechnique and Three-Way Control Chart

Consider a mass production of HDD arms where hundreds of CNC machines are used to manufacturer the HDD arms. According to an overwhelming number of machines and models of arm, construction of separate control chart for monitoring each HDD arm model by each machine is not feasible. This research proposed a strategy to optimize the SPC management on shop floor. The procedure started from identifying the clusters of the machine with similar manufacturing performance using clustering technique. The three way control chart ( I - MR - R ) is then applied to each clustered group of machine. This proposed research has advantageous to the manufacturer in terms of not only better performance of the SPC but also the quality management paradigm.

A Fuzzy Time Series Forecasting Model for Multi-Variate Forecasting Analysis with Fuzzy C-Means Clustering

In this study, a fuzzy integrated logical forecasting method (FILF) is extended for multi-variate systems by using a vector autoregressive model. Fuzzy time series forecasting (FTSF) method was recently introduced by Song and Chissom [1]-[2] after that Chen improved the FTSF method. Rather than the existing literature, the proposed model is not only compared with the previous FTS models, but also with the conventional time series methods such as the classical vector autoregressive model. The cluster optimization is based on the C-means clustering method. An empirical study is performed for the prediction of the chartering rates of a group of dry bulk cargo ships. The root mean squared error (RMSE) metric is used for the comparing of results of methods and the proposed method has superiority than both traditional FTS methods and also the classical time series methods.

PoPCoRN: A Power-Aware Periodic Surveillance Scheme in Convex Region using Wireless Mobile Sensor Networks

In this paper, the periodic surveillance scheme has been proposed for any convex region using mobile wireless sensor nodes. A sensor network typically consists of fixed number of sensor nodes which report the measurements of sensed data such as temperature, pressure, humidity, etc., of its immediate proximity (the area within its sensing range). For the purpose of sensing an area of interest, there are adequate number of fixed sensor nodes required to cover the entire region of interest. It implies that the number of fixed sensor nodes required to cover a given area will depend on the sensing range of the sensor as well as deployment strategies employed. It is assumed that the sensors to be mobile within the region of surveillance, can be mounted on moving bodies like robots or vehicle. Therefore, in our scheme, the surveillance time period determines the number of sensor nodes required to be deployed in the region of interest. The proposed scheme comprises of three algorithms namely: Hexagonalization, Clustering, and Scheduling, The first algorithm partitions the coverage area into fixed sized hexagons that approximate the sensing range (cell) of individual sensor node. The clustering algorithm groups the cells into clusters, each of which will be covered by a single sensor node. The later determines a schedule for each sensor to serve its respective cluster. Each sensor node traverses all the cells belonging to the cluster assigned to it by oscillating between the first and the last cell for the duration of its life time. Simulation results show that our scheme provides full coverage within a given period of time using few sensors with minimum movement, less power consumption, and relatively less infrastructure cost.

Network Anomaly Detection using Soft Computing

One main drawback of intrusion detection system is the inability of detecting new attacks which do not have known signatures. In this paper we discuss an intrusion detection method that proposes independent component analysis (ICA) based feature selection heuristics and using rough fuzzy for clustering data. ICA is to separate these independent components (ICs) from the monitored variables. Rough set has to decrease the amount of data and get rid of redundancy and Fuzzy methods allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining- (KDDCup 1999) dataset.

Effect of Clustering on Energy Efficiency and Network Lifetime in Wireless Sensor Networks

Wireless Sensor Network is Multi hop Self-configuring Wireless Network consisting of sensor nodes. The deployment of wireless sensor networks in many application areas, e.g., aggregation services, requires self-organization of the network nodes into clusters. Efficient way to enhance the lifetime of the system is to partition the network into distinct clusters with a high energy node as cluster head. The different methods of node clustering techniques have appeared in the literature, and roughly fall into two families; those based on the construction of a dominating set and those which are based solely on energy considerations. Energy optimized cluster formation for a set of randomly scattered wireless sensors is presented. Sensors within a cluster are expected to be communicating with cluster head only. The energy constraint and limited computing resources of the sensor nodes present the major challenges in gathering the data. In this paper we propose a framework to study how partially correlated data affect the performance of clustering algorithms. The total energy consumption and network lifetime can be analyzed by combining random geometry techniques and rate distortion theory. We also present the relation between compression distortion and data correlation.

Graph-Based Text Similarity Measurement by Exploiting Wikipedia as Background Knowledge

Text similarity measurement is a fundamental issue in many textual applications such as document clustering, classification, summarization and question answering. However, prevailing approaches based on Vector Space Model (VSM) more or less suffer from the limitation of Bag of Words (BOW), which ignores the semantic relationship among words. Enriching document representation with background knowledge from Wikipedia is proven to be an effective way to solve this problem, but most existing methods still cannot avoid similar flaws of BOW in a new vector space. In this paper, we propose a novel text similarity measurement which goes beyond VSM and can find semantic affinity between documents. Specifically, it is a unified graph model that exploits Wikipedia as background knowledge and synthesizes both document representation and similarity computation. The experimental results on two different datasets show that our approach significantly improves VSM-based methods in both text clustering and classification.

Error-Robust Nature of Genome Profiling Applied for Clustering of Species Demonstrated by Computer Simulation

Genome profiling (GP), a genotype based technology, which exploits random PCR and temperature gradient gel electrophoresis, has been successful in identification/classification of organisms. In this technology, spiddos (Species identification dots) and PaSS (Pattern similarity score) were employed for measuring the closeness (or distance) between genomes. Based on the closeness (PaSS), we can buildup phylogenetic trees of the organisms. We noticed that the topology of the tree is rather robust against the experimental fluctuation conveyed by spiddos. This fact was confirmed quantitatively in this study by computer-simulation, providing the limit of the reliability of this highly powerful methodology. As a result, we could demonstrate the effectiveness of the GP approach for identification/classification of organisms.

Incremental Algorithm to Cluster the Categorical Data with Frequency Based Similarity Measure

Clustering categorical data is more complicated than the numerical clustering because of its special properties. Scalability and memory constraint is the challenging problem in clustering large data set. This paper presents an incremental algorithm to cluster the categorical data. Frequencies of attribute values contribute much in clustering similar categorical objects. In this paper we propose new similarity measures based on the frequencies of attribute values and its cardinalities. The proposed measures and the algorithm are experimented with the data sets from UCI data repository. Results prove that the proposed method generates better clusters than the existing one.

Cumulative Learning based on Dynamic Clustering of Hierarchical Production Rules(HPRs)

An important structuring mechanism for knowledge bases is building clusters based on the content of their knowledge objects. The objects are clustered based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. Clustering can also facilitate taxonomy formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. In this paper, a set of related HPRs is called a cluster and is represented by a HPR-tree. This paper discusses an algorithm based on cumulative learning scenario for dynamic structuring of clusters. The proposed scheme incrementally incorporates new knowledge into the set of clusters from the previous episodes and also maintains summary of clusters as Synopsis to be used in the future episodes. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested incremental structuring of clusters would be useful in mining data streams.

Plant Varieties Selection System

In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.