Abstract: In the world of Peer-to-Peer (P2P) networking
different protocols have been developed to make the resource sharing
or information retrieval more efficient. The SemPeer protocol is a
new layer on Gnutella that transforms the connections of the nodes
based on semantic information to make information retrieval more
efficient. However, this transformation causes high clustering in the
network that decreases the number of nodes reached, therefore the
probability of finding a document is also decreased. In this paper we
describe a mathematical model for the Gnutella and SemPeer
protocols that captures clustering-related issues, followed by a
proposition to modify the SemPeer protocol to achieve moderate
clustering. This modification is a sort of link management for the
individual nodes that allows the SemPeer protocol to be more
efficient, because the probability of a successful query in the P2P
network is reasonably increased. For the validation of the models, we
evaluated a series of simulations that supported our results.
Abstract: Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.
Abstract: In this study, a new criterion for determining the number of classes an image should be segmented is proposed. This criterion is based on discriminant analysis for measuring the separability among the segmented classes of pixels. Based on the new discriminant criterion, two algorithms for recursively segmenting the image into determined number of classes are proposed. The proposed methods can automatically and correctly segment objects with various illuminations into separated images for further processing. Experiments on the extraction of text strings from complex document images demonstrate the effectiveness of the proposed methods.1
Abstract: Reliable secure multicast communication in mobile
adhoc networks is challenging due to its inherent characteristics of
infrastructure-less architecture with lack of central authority, high
packet loss rates and limited resources such as bandwidth, time and
power. Many emerging commercial and military applications require
secure multicast communication in adhoc environments. Hence key
management is the fundamental challenge in achieving reliable
secure communication using multicast key distribution for mobile
adhoc networks. Thus in designing a reliable multicast key
distribution scheme, reliability and congestion control over
throughput are essential components. This paper proposes and
evaluates the performance of an enhanced optimized multicast cluster
tree algorithm with destination sequenced distance vector routing
protocol to provide reliable multicast key distribution. Simulation
results in NS2 accurately predict the performance of proposed
scheme in terms of key delivery ratio and packet loss rate under
varying network conditions. This proposed scheme achieves
reliability, while exhibiting low packet loss rate with high key
delivery ratio compared with the existing scheme.
Abstract: Knowing about the customer behavior in a grocery has
been a long-standing issue in the retailing industry. The advent of
RFID has made it easier to collect moving data for an individual
shopper's behavior. Most of the previous studies used the traditional
statistical clustering technique to find the major characteristics of
customer behavior, especially shopping path. However, in using the
clustering technique, due to various spatial constraints in the store,
standard clustering methods are not feasible because moving data such
as the shopping path should be adjusted in advance of the analysis,
which is time-consuming and causes data distortion. To alleviate this
problem, we propose a new approach to spatial pattern clustering
based on the longest common subsequence. Experimental results using
real data obtained from a grocery confirm the good performance of the
proposed method in finding the hot spot, dead spot and major path
patterns of customer movements.
Abstract: Fuzzy C-means Clustering algorithm (FCM) is a
method that is frequently used in pattern recognition. It has the
advantage of giving good modeling results in many cases, although,
it is not capable of specifying the number of clusters by itself. In
FCM algorithm most researchers fix weighting exponent (m) to a
conventional value of 2 which might not be the appropriate for all
applications. Consequently, the main objective of this paper is to use
the subtractive clustering algorithm to provide the optimal number of
clusters needed by FCM algorithm by optimizing the parameters of
the subtractive clustering algorithm by an iterative search approach
and then to find an optimal weighting exponent (m) for the FCM
algorithm. In order to get an optimal number of clusters, the iterative
search approach is used to find the optimal single-output Sugenotype
Fuzzy Inference System (FIS) model by optimizing the
parameters of the subtractive clustering algorithm that give minimum
least square error between the actual data and the Sugeno fuzzy
model. Once the number of clusters is optimized, then two
approaches are proposed to optimize the weighting exponent (m) in
the FCM algorithm, namely, the iterative search approach and the
genetic algorithms. The above mentioned approach is tested on the
generated data from the original function and optimal fuzzy models
are obtained with minimum error between the real data and the
obtained fuzzy models.
Abstract: A predictive clustering hybrid regression (pCHR)
approach was developed and evaluated using dataset from H2-
producing sucrose-based bioreactor operated for 15 months. The aim
was to model and predict the H2-production rate using information
available about envirome and metabolome of the bioprocess. Selforganizing
maps (SOM) and Sammon map were used to visualize the
dataset and to identify main metabolic patterns and clusters in
bioprocess data. Three metabolic clusters: acetate coupled with other
metabolites, butyrate only, and transition phases were detected. The
developed pCHR model combines principles of k-means clustering,
kNN classification and regression techniques. The model performed
well in modeling and predicting the H2-production rate with mean
square error values of 0.0014 and 0.0032, respectively.
Abstract: Network security attacks are the violation of
information security policy that received much attention to the
computational intelligence society in the last decades. Data mining
has become a very useful technique for detecting network intrusions
by extracting useful knowledge from large number of network data
or logs. Naïve Bayesian classifier is one of the most popular data
mining algorithm for classification, which provides an optimal way
to predict the class of an unknown example. It has been tested that
one set of probability derived from data is not good enough to have
good classification rate. In this paper, we proposed a new learning
algorithm for mining network logs to detect network intrusions
through naïve Bayesian classifier, which first clusters the network
logs into several groups based on similarity of logs, and then
calculates the prior and conditional probabilities for each group of
logs. For classifying a new log, the algorithm checks in which cluster
the log belongs and then use that cluster-s probability set to classify
the new log. We tested the performance of our proposed algorithm by
employing KDD99 benchmark network intrusion detection dataset,
and the experimental results proved that it improves detection rates
as well as reduces false positives for different types of network
intrusions.
Abstract: This paper describes about the process of recognition and classification of brain images such as normal and abnormal based on PSO-SVM. Image Classification is becoming more important for medical diagnosis process. In medical area especially for diagnosis the abnormality of the patient is classified, which plays a great role for the doctors to diagnosis the patient according to the severeness of the diseases. In case of DICOM images it is very tough for optimal recognition and early detection of diseases. Our work focuses on recognition and classification of DICOM image based on collective approach of digital image processing. For optimal recognition and classification Particle Swarm Optimization (PSO), Genetic Algorithm (GA) and Support Vector Machine (SVM) are used. The collective approach by using PSO-SVM gives high approximation capability and much faster convergence.
Abstract: Monitoring the tool flank wear without affecting the
throughput is considered as the prudent method in production
technology. The examination has to be done without affecting the
machining process. In this paper we proposed a novel work that is
used to determine tool flank wear by observing the sound signals
emitted during the turning process. The work-piece material we used
here is steel and aluminum and the cutting insert was carbide
material. Two different cutting speeds were used in this work. The
feed rate and the cutting depth were constant whereas the flank wear
was a variable. The emitted sound signal of a fresh tool (0 mm flank
wear) a slightly worn tool (0.2 -0.25 mm flank wear) and a severely
worn tool (0.4mm and above flank wear) during turning process were
recorded separately using a high sensitive microphone. Analysis
using Singular Value Decomposition was done on these sound
signals to extract the feature sound components. Observation of the
results showed that an increase in tool flank wear correlates with an
increase in the values of SVD features produced out of the sound
signals for both the materials. Hence it can be concluded that wear
monitoring of tool flank during turning process using SVD features
with the Fuzzy C means classification on the emitted sound signal is
a potential and relatively simple method.
Abstract: Prediction of fault-prone modules provides one way to
support software quality engineering. Clustering is used to determine
the intrinsic grouping in a set of unlabeled data. Among various
clustering techniques available in literature K-Means clustering
approach is most widely being used. This paper introduces K-Means
based Clustering approach for software finding the fault proneness of
the Object-Oriented systems. The contribution of this paper is that it
has used Metric values of JEdit open source software for generation
of the rules for the categorization of software modules in the
categories of Faulty and non faulty modules and thereafter
empirically validation is performed. The results are measured in
terms of accuracy of prediction, probability of Detection and
Probability of False Alarms.
Abstract: The weighting exponent m is called the fuzzifier that
can have influence on the clustering performance of fuzzy c-means
(FCM) and mÎ[1.5,2.5] is suggested by Pal and Bezdek [13]. In this
paper, we will discuss the robust properties of FCM and show that the
parameter m will have influence on the robustness of FCM. According
to our analysis, we find that a large m value will make FCM more
robust to noise and outliers. However, if m is larger than the theoretical
upper bound proposed by Yu et al. [14], the sample mean will become
the unique optimizer. Here, we suggest to implement the FCM
algorithm with mÎ[1.5,4] under the restriction when m is smaller
than the theoretical upper bound.
Abstract: In the past few years, the use of wireless sensor networks (WSNs) potentially increased in applications such as intrusion detection, forest fire detection, disaster management and battle field. Sensor nodes are generally battery operated low cost devices. The key challenge in the design and operation of WSNs is to prolong the network life time by reducing the energy consumption among sensor nodes. Node clustering is one of the most promising techniques for energy conservation. This paper presents a novel clustering algorithm which maximizes the network lifetime by reducing the number of communication among sensor nodes. This approach also includes new distributed cluster formation technique that enables self-organization of large number of nodes, algorithm for maintaining constant number of clusters by prior selection of cluster head and rotating the role of cluster head to evenly distribute the energy load among all sensor nodes.
Abstract: There are several approaches in trying to solve the
Quantitative 1Structure-Activity Relationship (QSAR) problem.
These approaches are based either on statistical methods or on
predictive data mining. Among the statistical methods, one should
consider regression analysis, pattern recognition (such as cluster
analysis, factor analysis and principal components analysis) or partial
least squares. Predictive data mining techniques use either neural
networks, or genetic programming, or neuro-fuzzy knowledge. These
approaches have a low explanatory capability or non at all. This
paper attempts to establish a new approach in solving QSAR
problems using descriptive data mining. This way, the relationship
between the chemical properties and the activity of a substance
would be comprehensibly modeled.
Abstract: Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.
Abstract: This paper describes a new algorithm of arrangement
in parallel, based on Odd-Even Mergesort, called division and
concurrent mixes. The main idea of the algorithm is to achieve that
each processor uses a sequential algorithm for ordering a part of the
vector, and after that, for making the processors work in pairs in
order to mix two of these sections ordered in a greater one, also
ordered; after several iterations, the vector will be completely
ordered. The paper describes the implementation of the new
algorithm on a Message Passing environment (such as MPI). Besides,
it compares the obtained experimental results with the quicksort
sequential algorithm and with the parallel implementations (also on
MPI) of the algorithms quicksort and bitonic sort. The comparison
has been realized in an 8 processors cluster under GNU/Linux which
is running on a unique PC processor.
Abstract: Continuous measurements and multivariate methods are applied in researching the effects of energy consumption on indoor air quality (IAQ) in a Finnish one-family house. Measured data used in this study was collected continuously in a house in Kuopio, Eastern Finland, during fourteen months long period. Consumption parameters measured were the consumptions of district heat, electricity and water. Indoor parameters gathered were temperature, relative humidity (RH), the concentrations of carbon dioxide (CO2) and carbon monoxide (CO) and differential air pressure. In this study, self-organizing map (SOM) and Sammon's mapping were applied to resolve the effects of energy consumption on indoor air quality. Namely, the SOM was qualified as a suitable method having a property to summarize the multivariable dependencies into easily observable two-dimensional map. Accompanying that, the Sammon's mapping method was used to cluster pre-processed data to find similarities of the variables, expressing distances and groups in the data. The methods used were able to distinguish 7 different clusters characterizing indoor air quality and energy efficiency in the study house. The results indicate, that the cost implications in euros of heating and electricity energy vary according to the differential pressure, concentration of carbon dioxide, temperature and season.
Abstract: Due to the limited energy resources, energy efficient operation of sensor node is a key issue in wireless sensor networks. Clustering is an effective method to prolong the lifetime of energy constrained wireless sensor network. However, clustering in wireless sensor network faces several challenges such as selection of an optimal group of sensor nodes as cluster, optimum selection of cluster head, energy balanced optimal strategy for rotating the role of cluster head in a cluster, maintaining intra and inter cluster connectivity and optimal data routing in the network. In this paper, we propose a protocol supporting an energy efficient clustering, cluster head selection/rotation and data routing method to prolong the lifetime of sensor network. Simulation results demonstrate that the proposed protocol prolongs network lifetime due to the use of efficient clustering, cluster head selection/rotation and data routing.
Abstract: K-Means (KM) is considered one of the major
algorithms widely used in clustering. However, it still has some
problems, and one of them is in its initialization step where it is
normally done randomly. Another problem for KM is that it
converges to local minima. Genetic algorithms are one of the
evolutionary algorithms inspired from nature and utilized in the field
of clustering. In this paper, we propose two algorithms to solve the
initialization problem, Genetic Algorithm Initializes KM (GAIK) and
KM Initializes Genetic Algorithm (KIGA). To show the effectiveness
and efficiency of our algorithms, a comparative study was done
among GAIK, KIGA, Genetic-based Clustering Algorithm (GCA),
and FCM [19].
Abstract: Data gathering is an essential operation in wireless
sensor network applications. So it requires energy efficiency
techniques to increase the lifetime of the network. Similarly,
clustering is also an effective technique to improve the energy
efficiency and network lifetime of wireless sensor networks. In this
paper, an energy efficient cluster formation protocol is proposed with
the objective of achieving low energy dissipation and latency without
sacrificing application specific quality. The objective is achieved by
applying randomized, adaptive, self-configuring cluster formation
and localized control for data transfers. It involves application -
specific data processing, such as data aggregation or compression.
The cluster formation algorithm allows each node to make
independent decisions, so as to generate good clusters as the end.
Simulation results show that the proposed protocol utilizes minimum
energy and latency for cluster formation, there by reducing the
overhead of the protocol.