Abstract: Due to heavy energy constraints in WSNs clustering is
an efficient way to manage the energy in sensors. There are many
methods already proposed in the area of clustering and research is
still going on to make clustering more energy efficient. In our paper
we are proposing a minimum spanning tree based clustering using
divide and conquer approach. The MST based clustering was first
proposed in 1970’s for large databases. Here we are taking divide and
conquer approach and implementing it for wireless sensor networks
with the constraints attached to the sensor networks. This Divide and
conquer approach is implemented in a way that we don’t have to
construct the whole MST before clustering but we just find the edge
which will be the part of the MST to a corresponding graph and
divide the graph in clusters there itself if that edge from the graph can
be removed judging on certain constraints and hence saving lot of
computation.
Abstract: A mobile ad hoc network is a network of mobile nodes
without any notion of centralized administration. In such a network,
each mobile node behaves not only as a host which runs applications
but also as a router to forward packets on behalf of others. Clustering
has been applied to routing protocols to achieve efficient
communications. A CH network expresses the connected relationship
among cluster-heads. This paper discusses the methods for
constructing a CH network, and produces the following results: (1)
The required running costs of 3 traditional methods for constructing a
CH network are not so different from each other in the static
circumstance, or in the dynamic circumstance. Their running costs in
the static circumstance do not differ from their costs in the dynamic
circumstance. Meanwhile, although the routing costs required for the
above 3 methods are not so different in the static circumstance, the
costs are considerably different from each other in the dynamic
circumstance. Their routing costs in the static circumstance are also
very different from their costs in the dynamic circumstance, and the
former is one tenths of the latter. The routing cost in the dynamic
circumstance is mostly the cost for re-routing. (2) On the strength of
the above results, we discuss new 2 methods regarding whether they
are tolerable or not in the dynamic circumstance, that is, whether the
times of re-routing are small or not. These new methods are revised
methods that are based on the traditional methods. We recommended
the method which produces the smallest routing cost in the dynamic
circumstance, therefore producing the smallest total cost.
Abstract: Microarray data profiles gene expression on a whole
genome scale, therefore, it provides a good way to study associations
between gene expression and occurrence or progression of cancer.
More and more researchers realized that microarray data is helpful
to predict cancer sample. However, the high dimension of gene
expressions is much larger than the sample size, which makes this
task very difficult. Therefore, how to identify the significant genes
causing cancer becomes emergency and also a hot and hard research
topic. Many feature selection algorithms have been proposed in
the past focusing on improving cancer predictive accuracy at the
expense of ignoring the correlations between the features. In this
work, a novel framework (named by SGS) is presented for stable gene
selection and efficient cancer prediction . The proposed framework
first performs clustering algorithm to find the gene groups where
genes in each group have higher correlation coefficient, and then
selects the significant genes in each group with Bayesian Lasso and
important gene groups with group Lasso, and finally builds prediction
model based on the shrinkage gene space with efficient classification
algorithm (such as, SVM, 1NN, Regression and etc.). Experiment
results on real world data show that the proposed framework often
outperforms the existing feature selection and prediction methods,
say SAM, IG and Lasso-type prediction model.
Abstract: In this paper we use exponential particle swarm
optimization (EPSO) to cluster data. Then we compare between
(EPSO) clustering algorithm which depends on exponential variation
for the inertia weight and particle swarm optimization (PSO)
clustering algorithm which depends on linear inertia weight. This
comparison is evaluated on five data sets. The experimental results
show that EPSO clustering algorithm increases the possibility to find
the optimal positions as it decrease the number of failure. Also show
that (EPSO) clustering algorithm has a smaller quantization error
than (PSO) clustering algorithm, i.e. (EPSO) clustering algorithm
more accurate than (PSO) clustering algorithm.
Abstract: Although backpropagation ANNs generally predict
better than decision trees do for pattern classification problems, they
are often regarded as black boxes, i.e., their predictions cannot be
explained as those of decision trees. In many applications, it is
desirable to extract knowledge from trained ANNs for the users to
gain a better understanding of how the networks solve the problems.
A new rule extraction algorithm, called rule extraction from artificial
neural networks (REANN) is proposed and implemented to extract
symbolic rules from ANNs. A standard three-layer feedforward ANN
is the basis of the algorithm. A four-phase training algorithm is
proposed for backpropagation learning. Explicitness of the extracted
rules is supported by comparing them to the symbolic rules generated
by other methods. Extracted rules are comparable with other methods
in terms of number of rules, average number of conditions for a rule,
and predictive accuracy. Extensive experimental studies on several
benchmarks classification problems, such as breast cancer, iris,
diabetes, and season classification problems, demonstrate the
effectiveness of the proposed approach with good generalization
ability.
Abstract: The paper contains a review of the literature in terms of the critical analysis of methodologies of university ranking systems. Furthermore, the initiatives supported by the European Commission (U-Map, U-Multirank) and CHE Ranking are described. Special attention is paid to the tendencies in the development of ranking systems. According to the author, the ranking organizations should abandon the classic form of ranking, namely a hierarchical ordering of universities from “the best" to “the worse". In the empirical part of this paper, using one of the method of cluster analysis called k-means clustering, the author presents university classifications of the top universities from the Shanghai Jiao Tong University-s (SJTU) Academic Ranking of World Universities (ARWU).
Abstract: In this paper we propose a new knowledge model using
the Dempster-Shafer-s evidence theory for image segmentation and
fusion. The proposed method is composed essentially of two steps.
First, mass distributions in Dempster-Shafer theory are obtained from
the membership degrees of each pixel covering the three image
components (R, G and B). Each membership-s degree is determined by
applying Fuzzy C-Means (FCM) clustering to the gray levels of the
three images. Second, the fusion process consists in defining three
discernment frames which are associated with the three images to be
fused, and then combining them to form a new frame of discernment.
The strategy used to define mass distributions in the combined
framework is discussed in detail. The proposed fusion method is
illustrated in the context of image segmentation. Experimental
investigations and comparative studies with the other previous methods
are carried out showing thus the robustness and superiority of the
proposed method in terms of image segmentation.
Abstract: This paper presents an effective method for detecting vehicles in front of the camera-assisted car during nighttime driving. The proposed method detects vehicles based on detecting vehicle headlights and taillights using techniques of image segmentation and clustering. First, to effectively extract spotlight of interest, a segmentation process based on automatic multi-level threshold method is applied on the road-scene images. Second, to spatial clustering vehicle of detecting lamps, a grouping process based on light tracking and locating vehicle lighting patterns. For simulation, we are implemented through Da-vinci 7437 DSP board with near infrared mono-camera and tested it in the urban and rural roads. Through the test, classification performances are above 97% of true positive rate evaluated on real-time environment. Our method also has good performance in the case of clear, fog and rain weather.
Abstract: K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.
Abstract: A new blind symbol by symbol equalizer is proposed.
The operation of the proposed equalizer is based on the geometric
properties of the two dimensional data constellation. An unsupervised
clustering technique is used to locate the clusters formed by the
received data. The symmetric properties of the clusters labels are
subsequently utilized in order to label the clusters. Following this
step, the received data are compared to clusters and decisions are
made on a symbol by symbol basis, by assigning to each data
the label of the nearest cluster. The operation of the equalizer is
investigated both in linear and nonlinear channels. The performance
of the proposed equalizer is compared to the performance of a CMAbased
blind equalizer.
Abstract: Our objective in this paper is to propose an approach
capable of clustering web messages. The clustering is carried out by
assigning, with a certain probability, texts written by the same web
user to the same cluster based on Stylometric features and using
fuzzy clustering algorithms. Focus in the present work is on
comparing the most popular algorithms in fuzzy clustering theory
namely, Fuzzy C-means, Possibilistic C-means and Fuzzy
Possibilistic C-Means.
Abstract: In this paper, Neuro-Fuzzy based Fuzzy Subtractive
Clustering Method (FSCM) and Self Tuning Fuzzy PD-like
Controller (STFPDC) were used to solve non-linearity and trajectory
problems of pitch AND yaw angles of Twin Rotor MIMO system
(TRMS). The control objective is to make the beams of TRMS reach
a desired position quickly and accurately. The proposed method
could achieve control objectives with simpler controller. To simplify
the complexity of STFPDC, ANFIS based FSCM was used to
simplify the controller and improve the response. The proposed
controllers could achieve satisfactory objectives under different input
signals. Simulation results under MATLAB/Simulink® proved the
improvement of response and superiority of simplified STFPDC on
Fuzzy Logic Controller (FLC).
Abstract: In this paper, we represent protein structure by using
graph. A protein structure database will become a graph database.
Each graph is represented by a spectral vector. We use Jacobi
rotation algorithm to calculate the eigenvalues of the normalized
Laplacian representation of adjacency matrix of graph. To measure
the similarity between two graphs, we calculate the Euclidean
distance between two graph spectral vectors. To cluster the graphs,
we use M-tree with the Euclidean distance to cluster spectral vectors.
Besides, M-tree can be used for graph searching in graph database.
Our proposal method was tested with graph database of 100 graphs
representing 100 protein structures downloaded from Protein Data
Bank (PDB) and we compare the result with the SCOP hierarchical
structure.
Abstract: Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Abstract: Self-organizing map (SOM) provides both clustering and visualization capabilities in mining data. Dynamic self-organizing maps such as Growing Self-organizing Map (GSOM) has been developed to overcome the problem of fixed structure in SOM to enable better representation of the discovered patterns. However, in mining large datasets or historical data the hierarchical structure of the data is also useful to view the cluster formation at different levels of abstraction. In this paper, we present a technique to generate concept trees from the GSOM. The formation of tree from different spread factor values of GSOM is also investigated and the quality of the trees analyzed. The results show that concept trees can be generated from GSOM, thus, eliminating the need for re-clustering of the data from scratch to obtain a hierarchical view of the data under study.
Abstract: Wireless sensor networks (WSNs) consist of number
of tiny, low cost and low power sensor nodes to monitor some physical phenomenon. The major limitation in these networks is the use of non-rechargeable battery having limited power supply. The
main cause of energy consumption in such networks is
communication subsystem. This paper presents an energy efficient
Cluster Cooperative Caching at Sensor (C3S) based upon grid type clustering. Sensor nodes belonging to the same cluster/grid form a
cooperative cache system for the node since the cost for
communication with them is low both in terms of energy
consumption and message exchanges. The proposed scheme uses
cache admission control and utility based data replacement policy to
ensure that more useful data is retained in the local cache of a node.
Simulation results demonstrate that C3S scheme performs better in
various performance metrics than NICoCa which is existing
cooperative caching protocol for WSNs.
Abstract: This paper applies fuzzy clustering algorithm in classifying real estate companies in China according to some general financial indexes, such as income per share, share accumulation fund, net profit margins, weighted net assets yield and shareholders' equity. By constructing and normalizing initial partition matrix, getting fuzzy similar matrix with Minkowski metric and gaining the transitive closure, the dynamic fuzzy clustering analysis for real estate companies is shown clearly that different clustered result change gradually with the threshold reducing, and then, it-s shown there is the similar relationship with the prices of those companies in stock market. In this way, it-s great valuable in contrasting the real estate companies- financial condition in order to grasp some good chances of investment, and so on.
Abstract: In text categorization problem the most used method
for documents representation is based on words frequency vectors
called VSM (Vector Space Model). This representation is based only
on words from documents and in this case loses any “word context"
information found in the document. In this article we make a
comparison between the classical method of document representation
and a method called Suffix Tree Document Model (STDM) that is
based on representing documents in the Suffix Tree format. For the
STDM model we proposed a new approach for documents
representation and a new formula for computing the similarity
between two documents. Thus we propose to build the suffix tree
only for any two documents at a time. This approach is faster, it has
lower memory consumption and use entire document representation
without using methods for disposing nodes. Also for this method is
proposed a formula for computing the similarity between documents,
which improves substantially the clustering quality. This
representation method was validated using HAC - Hierarchical
Agglomerative Clustering. In this context we experiment also the
stemming influence in the document preprocessing step and highlight
the difference between similarity or dissimilarity measures to find
“closer" documents.
Abstract: The self-organizing map (SOM) model is a well-known neural network model with wide spread of applications. The main characteristics of SOM are two-fold, namely dimension reduction and topology preservation. Using SOM, a high-dimensional data space will be mapped to some low-dimensional space. Meanwhile, the topological relations among data will be preserved. With such characteristics, the SOM was usually applied on data clustering and visualization tasks. However, the SOM has main disadvantage of the need to know the number and structure of neurons prior to training, which are difficult to be determined. Several schemes have been proposed to tackle such deficiency. Examples are growing/expandable SOM, hierarchical SOM, and growing hierarchical SOM. These schemes could dynamically expand the map, even generate hierarchical maps, during training. Encouraging results were reported. Basically, these schemes adapt the size and structure of the map according to the distribution of training data. That is, they are data-driven or dataoriented SOM schemes. In this work, a topic-oriented SOM scheme which is suitable for document clustering and organization will be developed. The proposed SOM will automatically adapt the number as well as the structure of the map according to identified topics. Unlike other data-oriented SOMs, our approach expands the map and generates the hierarchies both according to the topics and their characteristics of the neurons. The preliminary experiments give promising result and demonstrate the plausibility of the method.
Abstract: An adaptive spatial Gaussian mixture model is proposed for clustering based color image segmentation. A new clustering objective function which incorporates the spatial information is introduced in the Bayesian framework. The weighting parameter for controlling the importance of spatial information is made adaptive to the image content to augment the smoothness towards piecewisehomogeneous region and diminish the edge-blurring effect and hence the name adaptive spatial finite mixture model. The proposed approach is compared with the spatially variant finite mixture model for pixel labeling. The experimental results with synthetic and Berkeley dataset demonstrate that the proposed method is effective in improving the segmentation and it can be employed in different practical image content understanding applications.