Abstract: In this paper a Pattern Recognition algorithm based on
a constrained version of the k-means clustering algorithm will be
presented. The proposed algorithm is a non parametric supervised
statistical pattern recognition algorithm, i.e. it works under very mild
assumptions on the dataset. The performance of the algorithm will
be tested, togheter with a feature extraction technique that captures
the information on the closed two-dimensional contour of an image,
on images of industrial mineral ores.
Abstract: In this paper, the periodic surveillance scheme has
been proposed for any convex region using mobile wireless sensor
nodes. A sensor network typically consists of fixed number of
sensor nodes which report the measurements of sensed data such as
temperature, pressure, humidity, etc., of its immediate proximity
(the area within its sensing range). For the purpose of sensing an
area of interest, there are adequate number of fixed sensor
nodes required to cover the entire region of interest. It implies
that the number of fixed sensor nodes required to cover a given
area will depend on the sensing range of the sensor as well as
deployment strategies employed. It is assumed that the sensors to
be mobile within the region of surveillance, can be mounted on
moving bodies like robots or vehicle. Therefore, in our
scheme, the surveillance time period determines the number of
sensor nodes required to be deployed in the region of interest.
The proposed scheme comprises of three algorithms namely:
Hexagonalization, Clustering, and Scheduling, The first algorithm
partitions the coverage area into fixed sized hexagons that
approximate the sensing range (cell) of individual sensor node.
The clustering algorithm groups the cells into clusters, each of
which will be covered by a single sensor node. The later
determines a schedule for each sensor to serve its respective cluster.
Each sensor node traverses all the cells belonging to the cluster
assigned to it by oscillating between the first and the last cell for
the duration of its life time. Simulation results show that our
scheme provides full coverage within a given period of time using
few sensors with minimum movement, less power consumption,
and relatively less infrastructure cost.
Abstract: Wireless Sensor Network is Multi hop Self-configuring
Wireless Network consisting of sensor nodes. The deployment of
wireless sensor networks in many application areas, e.g., aggregation
services, requires self-organization of the network nodes into clusters.
Efficient way to enhance the lifetime of the system is to partition the
network into distinct clusters with a high energy node as cluster head.
The different methods of node clustering techniques have appeared in
the literature, and roughly fall into two families; those based on the
construction of a dominating set and those which are based solely on
energy considerations. Energy optimized cluster formation for a set
of randomly scattered wireless sensors is presented. Sensors within a
cluster are expected to be communicating with cluster head only. The
energy constraint and limited computing resources of the sensor nodes
present the major challenges in gathering the data. In this paper we
propose a framework to study how partially correlated data affect the
performance of clustering algorithms. The total energy consumption
and network lifetime can be analyzed by combining random geometry
techniques and rate distortion theory. We also present the relation
between compression distortion and data correlation.
Abstract: Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
Abstract: Fuzzy C-means Clustering algorithm (FCM) is a
method that is frequently used in pattern recognition. It has the
advantage of giving good modeling results in many cases, although,
it is not capable of specifying the number of clusters by itself. In
FCM algorithm most researchers fix weighting exponent (m) to a
conventional value of 2 which might not be the appropriate for all
applications. Consequently, the main objective of this paper is to use
the subtractive clustering algorithm to provide the optimal number of
clusters needed by FCM algorithm by optimizing the parameters of
the subtractive clustering algorithm by an iterative search approach
and then to find an optimal weighting exponent (m) for the FCM
algorithm. In order to get an optimal number of clusters, the iterative
search approach is used to find the optimal single-output Sugenotype
Fuzzy Inference System (FIS) model by optimizing the
parameters of the subtractive clustering algorithm that give minimum
least square error between the actual data and the Sugeno fuzzy
model. Once the number of clusters is optimized, then two
approaches are proposed to optimize the weighting exponent (m) in
the FCM algorithm, namely, the iterative search approach and the
genetic algorithms. The above mentioned approach is tested on the
generated data from the original function and optimal fuzzy models
are obtained with minimum error between the real data and the
obtained fuzzy models.
Abstract: In the past few years, the use of wireless sensor networks (WSNs) potentially increased in applications such as intrusion detection, forest fire detection, disaster management and battle field. Sensor nodes are generally battery operated low cost devices. The key challenge in the design and operation of WSNs is to prolong the network life time by reducing the energy consumption among sensor nodes. Node clustering is one of the most promising techniques for energy conservation. This paper presents a novel clustering algorithm which maximizes the network lifetime by reducing the number of communication among sensor nodes. This approach also includes new distributed cluster formation technique that enables self-organization of large number of nodes, algorithm for maintaining constant number of clusters by prior selection of cluster head and rotating the role of cluster head to evenly distribute the energy load among all sensor nodes.
Abstract: Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.
Abstract: K-Means (KM) is considered one of the major
algorithms widely used in clustering. However, it still has some
problems, and one of them is in its initialization step where it is
normally done randomly. Another problem for KM is that it
converges to local minima. Genetic algorithms are one of the
evolutionary algorithms inspired from nature and utilized in the field
of clustering. In this paper, we propose two algorithms to solve the
initialization problem, Genetic Algorithm Initializes KM (GAIK) and
KM Initializes Genetic Algorithm (KIGA). To show the effectiveness
and efficiency of our algorithms, a comparative study was done
among GAIK, KIGA, Genetic-based Clustering Algorithm (GCA),
and FCM [19].
Abstract: In this paper a one-dimension Self Organizing Map
algorithm (SOM) to perform feature selection is presented. The
algorithm is based on a first classification of the input dataset on a
similarity space. From this classification for each class a set of
positive and negative features is computed. This set of features is
selected as result of the procedure. The procedure is evaluated on an
in-house dataset from a Knowledge Discovery from Text (KDT)
application and on a set of publicly available datasets used in
international feature selection competitions. These datasets come
from KDT applications, drug discovery as well as other applications.
The knowledge of the correct classification available for the training
and validation datasets is used to optimize the parameters for positive
and negative feature extractions. The process becomes feasible for
large and sparse datasets, as the ones obtained in KDT applications,
by using both compression techniques to store the similarity matrix
and speed up techniques of the Kohonen algorithm that take
advantage of the sparsity of the input matrix. These improvements
make it feasible, by using the grid, the application of the
methodology to massive datasets.
Abstract: Most of the biclustering/projected clustering algorithms are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in many applications, like gene expression data and word-document data, non linear relationships may exist between the objects. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we improve upon our previous algorithm that uses mutual information for biclustering in terms of computation time and also the type of clusters identified. The algorithm is able to find biclusters with mixed relationships and is faster than the previous one. To the best of our knowledge, none of the other existing algorithms for biclustering have used mutual information as a similarity measure. We present the experimental results on synthetic data as well as on the yeast expression data. Biclusters on the yeast data were found to be biologically and statistically significant using GO Tool Box and FuncAssociate.
Abstract: Mobile ad-hoc networks (MANETs) are a form of
wireless networks which do not require a base station for providing
network connectivity. Mobile ad-hoc networks have many
characteristics which distinguish them from other wireless networks
which make routing in such networks a challenging task. Cluster
based routing is one of the routing schemes for MANETs in which
various clusters of mobile nodes are formed with each cluster having
its own clusterhead which is responsible for routing among clusters.
In this paper we have proposed and implemented a distributed
weighted clustering algorithm for MANETs. This approach is based
on combined weight metric that takes into account several system
parameters like the node degree, transmission range, energy and
mobility of the nodes. We have evaluated the performance of
proposed scheme through simulation in various network situations.
Simulation results show that proposed scheme outperforms the
original distributed weighted clustering algorithm (DWCA).
Abstract: A new dynamic clustering approach (DCPSO), based
on Particle Swarm Optimization, is proposed. This approach is
applied to unsupervised image classification. The proposed approach
automatically determines the "optimum" number of clusters and
simultaneously clusters the data set with minimal user interference.
The algorithm starts by partitioning the data set into a relatively large
number of clusters to reduce the effects of initial conditions. Using
binary particle swarm optimization the "best" number of clusters is
selected. The centers of the chosen clusters is then refined via the Kmeans
clustering algorithm. The experiments conducted show that
the proposed approach generally found the "optimum" number of
clusters on the tested images.
Abstract: This paper represents four unsupervised clustering algorithms namely sIB, RandomFlatClustering, FarthestFirst, and FilteredClusterer that previously works have not been used for network traffic classification. The methodology, the result, the products of the cluster and evaluation of these algorithms with efficiency of each algorithm from accuracy are shown. Otherwise, the efficiency of these algorithms considering form the time that it use to generate the cluster quickly and correctly. Our work study and test the best algorithm by using classify traffic anomaly in network traffic with different attribute that have not been used before. We analyses the algorithm that have the best efficiency or the best learning and compare it to the previously used (K-Means). Our research will be use to develop anomaly detection system to more efficiency and more require in the future.
Abstract: Color image segmentation plays an important role in
computer vision and image processing areas. In this paper, the
features of Volterra filter are utilized for color image segmentation.
The discrete Volterra filter exhibits both linear and nonlinear
characteristics. The linear part smoothes the image features in
uniform gray zones and is used for getting a gross representation of
objects of interest. The nonlinear term compensates for the blurring
due to the linear term and preserves the edges which are mainly used
to distinguish the various objects. The truncated quadratic Volterra
filters are mainly used for edge preserving along with Gaussian noise
cancellation. In our approach, the segmentation is based on K-means
clustering algorithm in HSI space. Both the hue and the intensity
components are fully utilized. For hue clustering, the special cyclic
property of the hue component is taken into consideration. The
experimental results show that the proposed technique segments the
color image while preserving significant features and removing noise
effects.
Abstract: Image clustering is a process of grouping images
based on their similarity. The image clustering usually uses the color
component, texture, edge, shape, or mixture of two components, etc.
This research aims to explore image clustering using color
composition. In order to complete this image clustering, three main
components should be considered, which are color space, image
representation (feature extraction), and clustering method itself. We
aim to explore which composition of these factors will produce the
best clustering results by combining various techniques from the
three components. The color spaces use RGB, HSV, and L*a*b*
method. The image representations use Histogram and Gaussian
Mixture Model (GMM), whereas the clustering methods use KMeans
and Agglomerative Hierarchical Clustering algorithm. The
results of the experiment show that GMM representation is better
combined with RGB and L*a*b* color space, whereas Histogram is
better combined with HSV. The experiments also show that K-Means
is better than Agglomerative Hierarchical for images clustering.
Abstract: This paper presents a supervised clustering algorithm,
namely Grid-Based Supervised Clustering (GBSC), which is able to
identify clusters of any shapes and sizes without presuming any
canonical form for data distribution. The GBSC needs no prespecified
number of clusters, is insensitive to the order of the input
data objects, and is capable of handling outliers. Built on the
combination of grid-based clustering and density-based clustering,
under the assistance of the downward closure property of density
used in bottom-up subspace clustering, the GBSC can notably reduce
its search space to avoid the memory confinement situation during its
execution. On two-dimension synthetic datasets, the GBSC can
identify clusters with different shapes and sizes correctly. The GBSC
also outperforms other five supervised clustering algorithms when
the experiments are performed on some UCI datasets.
Abstract: In Data mining, Fuzzy clustering algorithms have
demonstrated advantage over crisp clustering algorithms in dealing
with the challenges posed by large collections of vague and uncertain
natural data. This paper reviews concept of fuzzy logic and fuzzy
clustering. The classical fuzzy c-means algorithm is presented and its
limitations are highlighted. Based on the study of the fuzzy c-means
algorithm and its extensions, we propose a modification to the cmeans
algorithm to overcome the limitations of it in calculating the
new cluster centers and in finding the membership values with
natural data. The efficiency of the new modified method is
demonstrated on real data collected for Bhutan-s Gross National
Happiness (GNH) program.
Abstract: This paper presents a software quality support tool, a
Java source code evaluator and a code profiler based on
computational intelligence techniques. It is Java prototype software
developed by AI Group [1] from the Research Laboratories at
Universidad de Palermo: an Intelligent Java Analyzer (in Spanish:
Analizador Java Inteligente, AJI). It represents a new approach to
evaluate and identify inaccurate source code usage and transitively,
the software product itself.
The aim of this project is to provide the software development
industry with a new tool to increase software quality by extending
the value of source code metrics through computational intelligence.
Abstract: Wireless Sensor Networks consist of small battery
powered devices with limited energy resources. once deployed, the
small sensor nodes are usually inaccessible to the user, and thus
replacement of the energy source is not feasible. Hence, One of the
most important issues that needs to be enhanced in order to improve
the life span of the network is energy efficiency. to overcome this
demerit many research have been done. The clustering is the one of
the representative approaches. in the clustering, the cluster heads
gather data from nodes and sending them to the base station. In this
paper, we introduce a dynamic clustering algorithm using genetic
algorithm. This algorithm takes different parameters into
consideration to increase the network lifetime. To prove efficiency of
proposed algorithm, we simulated the proposed algorithm compared
with LEACH algorithm using the matlab
Abstract: Cluster analysis divides data into groups that are
meaningful, useful, or both. Analysis of biological data is creating a
new generation of epidemiologic, prognostic, diagnostic and
treatment modalities. Clustering of protein sequences is one of the
current research topics in the field of computer science. Linear
relation is valuable in rule discovery for a given data, such as if value
X goes up 1, value Y will go down 3", etc. The classical linear
regression models the linear relation of two sequences perfectly.
However, if we need to cluster a large repository of protein sequences
into groups where sequences have strong linear relationship with
each other, it is prohibitively expensive to compare sequences one by
one. In this paper, we propose a new technique named General
Regression Model Technique Clustering Algorithm (GRMTCA) to
benignly handle the problem of linear sequences clustering. GRMT
gives a measure, GR*, to tell the degree of linearity of multiple
sequences without having to compare each pair of them.