Abstract: For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.
Abstract: In the Fe-3%Si sheets, grade Hi-B, with AlN and MnS
as inhibitors, the Goss grains which abnormally grow do not have a
size greater than the average size of the primary matrix. In this
heterogeneous microstructure, the size factor is not a required
condition for the secondary recrystallization. The onset of the small
Goss grain abnormal growth appears to be related to a particular
behavior of their grain boundaries, to the local texture and to the
distribution of the inhibitors. The presence and the evolution of
oriented clusters ensure to the small Goss grains a favorable
neighborhood to grow. The modified Monte-Carlo approach, which
is applied, considers the local environment of each grain. The grain
growth is dependent of its real spatial position; the matrix
heterogeneity is then taken into account. The grain growth conditions
are considered in the global matrix and in different matrixes
corresponding to A component clusters. The grain growth behaviour
is considered with introduction of energy only, energy and mobility,
energy and mobility and precipitates.
Abstract: This paper presents a new growing neural network for
cluster analysis and market segmentation, which optimizes the size
and structure of clusters by iteratively checking them for multivariate
normality. We combine the recently published SGNN approach [8]
with the basic principle underlying the Gaussian-means algorithm
[13] and the Mardia test for multivariate normality [18, 19]. The new
approach distinguishes from existing ones by its holistic design and
its great autonomy regarding the clustering process as a whole. Its
performance is demonstrated by means of synthetic 2D data and by
real lifestyle survey data usable for market segmentation.
Abstract: We develop a three-step fuzzy logic-based algorithm for clustering categorical attributes, and we apply it to analyze cultural data. In the first step the algorithm employs an entropy-based clustering scheme, which initializes the cluster centers. In the second step we apply the fuzzy c-modes algorithm to obtain a fuzzy partition of the data set, and the third step introduces a novel cluster validity index, which decides the final number of clusters.
Abstract: In this paper we focus on event extraction from Tamil
news article. This system utilizes a scoring scheme for extracting and
grouping event-specific sentences. Using this scoring scheme eventspecific
clustering is performed for multiple documents. Events are
extracted from each document using a scoring scheme based on
feature score and condition score. Similarly event specific sentences
are clustered from multiple documents using this scoring scheme.
The proposed system builds the Event Template based on user
specified query. The templates are filled with event specific details
like person, location and timeline extracted from the formed clusters.
The proposed system applies these methodologies for Tamil news
articles that have been enconverted into UNL graphs using a Tamil to
UNL-enconverter. The main intention of this work is to generate an
event based template.
Abstract: Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
Abstract: In the world of Peer-to-Peer (P2P) networking
different protocols have been developed to make the resource sharing
or information retrieval more efficient. The SemPeer protocol is a
new layer on Gnutella that transforms the connections of the nodes
based on semantic information to make information retrieval more
efficient. However, this transformation causes high clustering in the
network that decreases the number of nodes reached, therefore the
probability of finding a document is also decreased. In this paper we
describe a mathematical model for the Gnutella and SemPeer
protocols that captures clustering-related issues, followed by a
proposition to modify the SemPeer protocol to achieve moderate
clustering. This modification is a sort of link management for the
individual nodes that allows the SemPeer protocol to be more
efficient, because the probability of a successful query in the P2P
network is reasonably increased. For the validation of the models, we
evaluated a series of simulations that supported our results.
Abstract: Fuzzy C-means Clustering algorithm (FCM) is a
method that is frequently used in pattern recognition. It has the
advantage of giving good modeling results in many cases, although,
it is not capable of specifying the number of clusters by itself. In
FCM algorithm most researchers fix weighting exponent (m) to a
conventional value of 2 which might not be the appropriate for all
applications. Consequently, the main objective of this paper is to use
the subtractive clustering algorithm to provide the optimal number of
clusters needed by FCM algorithm by optimizing the parameters of
the subtractive clustering algorithm by an iterative search approach
and then to find an optimal weighting exponent (m) for the FCM
algorithm. In order to get an optimal number of clusters, the iterative
search approach is used to find the optimal single-output Sugenotype
Fuzzy Inference System (FIS) model by optimizing the
parameters of the subtractive clustering algorithm that give minimum
least square error between the actual data and the Sugeno fuzzy
model. Once the number of clusters is optimized, then two
approaches are proposed to optimize the weighting exponent (m) in
the FCM algorithm, namely, the iterative search approach and the
genetic algorithms. The above mentioned approach is tested on the
generated data from the original function and optimal fuzzy models
are obtained with minimum error between the real data and the
obtained fuzzy models.
Abstract: A predictive clustering hybrid regression (pCHR)
approach was developed and evaluated using dataset from H2-
producing sucrose-based bioreactor operated for 15 months. The aim
was to model and predict the H2-production rate using information
available about envirome and metabolome of the bioprocess. Selforganizing
maps (SOM) and Sammon map were used to visualize the
dataset and to identify main metabolic patterns and clusters in
bioprocess data. Three metabolic clusters: acetate coupled with other
metabolites, butyrate only, and transition phases were detected. The
developed pCHR model combines principles of k-means clustering,
kNN classification and regression techniques. The model performed
well in modeling and predicting the H2-production rate with mean
square error values of 0.0014 and 0.0032, respectively.
Abstract: Network security attacks are the violation of
information security policy that received much attention to the
computational intelligence society in the last decades. Data mining
has become a very useful technique for detecting network intrusions
by extracting useful knowledge from large number of network data
or logs. Naïve Bayesian classifier is one of the most popular data
mining algorithm for classification, which provides an optimal way
to predict the class of an unknown example. It has been tested that
one set of probability derived from data is not good enough to have
good classification rate. In this paper, we proposed a new learning
algorithm for mining network logs to detect network intrusions
through naïve Bayesian classifier, which first clusters the network
logs into several groups based on similarity of logs, and then
calculates the prior and conditional probabilities for each group of
logs. For classifying a new log, the algorithm checks in which cluster
the log belongs and then use that cluster-s probability set to classify
the new log. We tested the performance of our proposed algorithm by
employing KDD99 benchmark network intrusion detection dataset,
and the experimental results proved that it improves detection rates
as well as reduces false positives for different types of network
intrusions.
Abstract: In the past few years, the use of wireless sensor networks (WSNs) potentially increased in applications such as intrusion detection, forest fire detection, disaster management and battle field. Sensor nodes are generally battery operated low cost devices. The key challenge in the design and operation of WSNs is to prolong the network life time by reducing the energy consumption among sensor nodes. Node clustering is one of the most promising techniques for energy conservation. This paper presents a novel clustering algorithm which maximizes the network lifetime by reducing the number of communication among sensor nodes. This approach also includes new distributed cluster formation technique that enables self-organization of large number of nodes, algorithm for maintaining constant number of clusters by prior selection of cluster head and rotating the role of cluster head to evenly distribute the energy load among all sensor nodes.
Abstract: Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.
Abstract: Continuous measurements and multivariate methods are applied in researching the effects of energy consumption on indoor air quality (IAQ) in a Finnish one-family house. Measured data used in this study was collected continuously in a house in Kuopio, Eastern Finland, during fourteen months long period. Consumption parameters measured were the consumptions of district heat, electricity and water. Indoor parameters gathered were temperature, relative humidity (RH), the concentrations of carbon dioxide (CO2) and carbon monoxide (CO) and differential air pressure. In this study, self-organizing map (SOM) and Sammon's mapping were applied to resolve the effects of energy consumption on indoor air quality. Namely, the SOM was qualified as a suitable method having a property to summarize the multivariable dependencies into easily observable two-dimensional map. Accompanying that, the Sammon's mapping method was used to cluster pre-processed data to find similarities of the variables, expressing distances and groups in the data. The methods used were able to distinguish 7 different clusters characterizing indoor air quality and energy efficiency in the study house. The results indicate, that the cost implications in euros of heating and electricity energy vary according to the differential pressure, concentration of carbon dioxide, temperature and season.
Abstract: Data gathering is an essential operation in wireless
sensor network applications. So it requires energy efficiency
techniques to increase the lifetime of the network. Similarly,
clustering is also an effective technique to improve the energy
efficiency and network lifetime of wireless sensor networks. In this
paper, an energy efficient cluster formation protocol is proposed with
the objective of achieving low energy dissipation and latency without
sacrificing application specific quality. The objective is achieved by
applying randomized, adaptive, self-configuring cluster formation
and localized control for data transfers. It involves application -
specific data processing, such as data aggregation or compression.
The cluster formation algorithm allows each node to make
independent decisions, so as to generate good clusters as the end.
Simulation results show that the proposed protocol utilizes minimum
energy and latency for cluster formation, there by reducing the
overhead of the protocol.
Abstract: Data clustering is an important data exploration
technique with many applications in data mining. The k-means
algorithm is well known for its efficiency in clustering large data
sets. However, this algorithm is suitable for spherical shaped clusters
of similar sizes and densities. The quality of the resulting clusters
decreases when the data set contains spherical shaped with large
variance in sizes. In this paper, we introduce a competent procedure
to overcome this problem. The proposed method is based on shifting
the center of the large cluster toward the small cluster, and recomputing
the membership of small cluster points, the experimental
results reveal that the proposed algorithm produces satisfactory
results.
Abstract: For collecting data from all sensor nodes, some
changes in Dynamic Source Routing (DSR) protocol is proposed. At
each hop level, route-ranking technique is used for distributing
packets to different selected routes dynamically. For calculating rank
of a route, different parameters like: delay, residual energy and
probability of packet loss are used. A hybrid topology of
DMPR(Disjoint Multi Path Routing) and MMPR(Meshed Multi Path
Routing) is formed, where braided topology is used in different
faulty zones of network. For reducing energy consumption, variant
transmission ranges is used instead of fixed transmission range. For
reducing number of packet drop, a fuzzy logic inference scheme is
used to insert different types of delays dynamically. A rule based
system infers membership function strength which is used to
calculate the final delay amount to be inserted into each of the node
at different clusters.
In braided path, a proposed 'Dual Line ACK Link'scheme is
proposed for sending ACK signal from a damaged node or link to a
parent node to ensure that any error in link or any node-failure
message may not be lost anyway. This paper tries to design the
theoretical aspects of a model which may be applied for collecting
data from any large hanging iron structure with the help of wireless
sensor network. But analyzing these data is the subject of material
science and civil structural construction technology, that part is out
of scope of this paper.
Abstract: Most of the biclustering/projected clustering algorithms are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in many applications, like gene expression data and word-document data, non linear relationships may exist between the objects. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we improve upon our previous algorithm that uses mutual information for biclustering in terms of computation time and also the type of clusters identified. The algorithm is able to find biclusters with mixed relationships and is faster than the previous one. To the best of our knowledge, none of the other existing algorithms for biclustering have used mutual information as a similarity measure. We present the experimental results on synthetic data as well as on the yeast expression data. Biclusters on the yeast data were found to be biologically and statistically significant using GO Tool Box and FuncAssociate.
Abstract: Mobile ad-hoc networks (MANETs) are a form of
wireless networks which do not require a base station for providing
network connectivity. Mobile ad-hoc networks have many
characteristics which distinguish them from other wireless networks
which make routing in such networks a challenging task. Cluster
based routing is one of the routing schemes for MANETs in which
various clusters of mobile nodes are formed with each cluster having
its own clusterhead which is responsible for routing among clusters.
In this paper we have proposed and implemented a distributed
weighted clustering algorithm for MANETs. This approach is based
on combined weight metric that takes into account several system
parameters like the node degree, transmission range, energy and
mobility of the nodes. We have evaluated the performance of
proposed scheme through simulation in various network situations.
Simulation results show that proposed scheme outperforms the
original distributed weighted clustering algorithm (DWCA).
Abstract: This paper presents a new technique for detection of
human faces within color images. The approach relies on image
segmentation based on skin color, features extracted from the two-dimensional
discrete cosine transform (DCT), and self-organizing
maps (SOM). After candidate skin regions are extracted, feature
vectors are constructed using DCT coefficients computed from those
regions. A supervised SOM training session is used to cluster feature
vectors into groups, and to assign “face" or “non-face" labels to those
clusters. Evaluation was performed using a new image database of
286 images, containing 1027 faces. After training, our detection
technique achieved a detection rate of 77.94% during subsequent
tests, with a false positive rate of 5.14%. To our knowledge, the
proposed technique is the first to combine DCT-based feature
extraction with a SOM for detecting human faces within color
images. It is also one of a few attempts to combine a feature-invariant
approach, such as color-based skin segmentation, together with
appearance-based face detection. The main advantage of the new
technique is its low computational requirements, in terms of both
processing speed and memory utilization.
Abstract: The cellular network is one of the emerging areas of
communication, in which the mobile nodes act as member for one
base station. The cluster based communication is now an emerging
area of wireless cellular multimedia networks. The cluster renders
fast communication and also a convenient way to work with
connectivity. In our scheme we have proposed an optimization
technique for the fuzzy cluster nodes, by categorizing the group
members into three categories like long refreshable member, medium
refreshable member and short refreshable member. By considering
long refreshable nodes as static nodes, we compute the new
membership values for the other nodes in the cluster. We compare
their previous and present membership value with the threshold value
to categorize them into three different members. By which, we
optimize the nodes in the fuzzy clusters. The simulation results show
that there is reduction in the cluster computational time and
iterational time after optimization.