Abstract: An important structuring mechanism for knowledge bases is building clusters based on the content of their knowledge objects. The objects are clustered based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. Clustering can also facilitate taxonomy formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. In this paper, a set of related HPRs is called a cluster and is represented by a HPR-tree. This paper discusses an algorithm based on cumulative learning scenario for dynamic structuring of clusters. The proposed scheme incrementally incorporates new knowledge into the set of clusters from the previous episodes and also maintains summary of clusters as Synopsis to be used in the future episodes. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested incremental structuring of clusters would be useful in mining data streams.
Abstract: I/O workload is a critical and important factor to
analyze I/O pattern and file system performance. However tracing I/O
operations on the fly distributed parallel file system is non-trivial due
to collection overhead and a large volume of data. In this paper, we
design and implement a parallel file system logging method for high
performance computing using shared memory-based multi-layer
scheme. It minimizes the overhead with reduced logging operation
response time and provides efficient post-processing scheme through
shared memory. Separated logging server can collect sequential logs
from multiple clients in a cluster through packet communication.
Implementation and evaluation result shows low overhead and high
scalability of this architecture for high performance parallel logging
analysis.
Abstract: A new technique of topological multi-scale analysis is
introduced. By performing a clustering recursively to build a
hierarchy, and analyzing the co-scale and intra-scale similarities, an
Iterated Function System can be extracted from any data set. The study
of fractals shows that this method is efficient to extract
self-similarities, and can find elegant solutions the inverse problem of
building fractals. The theoretical aspects and practical
implementations are discussed, together with examples of analyses of
simple fractals.
Abstract: This paper presents a comparative analysis of a new
unsupervised PCA-based technique for steel plates texture segmentation
towards defect detection. The proposed scheme called Variance
Based Component Analysis or VBCA employs PCA for feature
extraction, applies a feature reduction algorithm based on variance of
eigenpictures and classifies the pixels as defective and normal. While
the classic PCA uses a clusterer like Kmeans for pixel clustering,
VBCA employs thresholding and some post processing operations to
label pixels as defective and normal. The experimental results show
that proposed algorithm called VBCA is 12.46% more accurate and
78.85% faster than the classic PCA.
Abstract: A clustering is process to identify a homogeneous
groups of object called as cluster. Clustering is one interesting topic
on data mining. A group or class behaves similarly characteristics.
This paper discusses a robust clustering process for data images with
two reduction dimension approaches; i.e. the two dimensional
principal component analysis (2DPCA) and principal component
analysis (PCA). A standard approach to overcome this problem is
dimension reduction, which transforms a high-dimensional data into
a lower-dimensional space with limited loss of information. One of
the most common forms of dimensionality reduction is the principal
components analysis (PCA). The 2DPCA is often called a variant of
principal component (PCA), the image matrices were directly treated
as 2D matrices; they do not need to be transformed into a vector so
that the covariance matrix of image can be constructed directly using
the original image matrices. The decomposed classical covariance
matrix is very sensitive to outlying observations. The objective of
paper is to compare the performance of robust minimizing vector
variance (MVV) in the two dimensional projection PCA (2DPCA)
and the PCA for clustering on an arbitrary data image when outliers
are hiden in the data set. The simulation aspects of robustness and
the illustration of clustering images are discussed in the end of
paper
Abstract: Due to the limited energy resources, energy efficient operation of sensor node is a key issue in wireless sensor networks. Clustering is an effective method to prolong the lifetime of energy constrained wireless sensor network. However, clustering in wireless sensor network faces several challenges such as selection of an optimal group of sensor nodes as cluster, optimum selection of cluster head, energy balanced optimal strategy for rotating the role of cluster head in a cluster, maintaining intra and inter cluster connectivity and optimal data routing in the network. In this paper, we propose a protocol supporting an energy efficient clustering, cluster head selection/rotation and data routing method to prolong the lifetime of sensor network. Simulation results demonstrate that the proposed protocol prolongs network lifetime due to the use of efficient clustering, cluster head selection/rotation and data routing.
Abstract: Data clustering is an important data exploration
technique with many applications in data mining. The k-means
algorithm is well known for its efficiency in clustering large data
sets. However, this algorithm is suitable for spherical shaped clusters
of similar sizes and densities. The quality of the resulting clusters
decreases when the data set contains spherical shaped with large
variance in sizes. In this paper, we introduce a competent procedure
to overcome this problem. The proposed method is based on shifting
the center of the large cluster toward the small cluster, and recomputing
the membership of small cluster points, the experimental
results reveal that the proposed algorithm produces satisfactory
results.
Abstract: The aim of this paper is to understand how peers can
influence adolescent girls- dieting behaviour and their body image.
Departing from imitation and social learning theories, we study
whether adolescent girls tend to model their peer group dieting
behaviours, thus influencing their body image construction. Our
study was conducted through an enquiry applied to a cluster sample
of 466 adolescent high school girls in Lisbon city public schools. Our
main findings point to an association between girls- and peers-
dieting behaviours, thus reinforcing the modelling hypothesis.
Abstract: Color image segmentation can be considered as a
cluster procedure in feature space. k-means and its adaptive
version, i.e. competitive learning approach are powerful tools
for data clustering. But k-means and competitive learning suffer
from several drawbacks such as dead-unit problem and need to
pre-specify number of cluster. In this paper, we will explore to
use competitive and cooperative learning approach to perform
color image segmentation. In competitive and cooperative
learning approach, seed points not only compete each other, but
also the winner will dynamically select several nearest
competitors to form a cooperative team to adapt to the input
together, finally it can automatically select the correct number
of cluster and avoid the dead-units problem. Experimental
results show that CCL can obtain better segmentation result.
Abstract: In this study, communities of ammonia-oxidizing
archaea (AOA) and ammonia-oxidizing bacteria (AOB) in nitrifying
activated sludge (NAS) prepared by enriching sludge from a
municipal wastewater treatment plant in three continuous-flow
reactors receiving an inorganic medium containing different
ammonium concentrations of 2, 10, and 30 mM NH4
+-N (NAS2,
NAS10, and NAS30, respectively) were investigated using molecular
analysis. Results suggested that almost all AOA clones from NAS2,
NAS10, and NAS30 fell into the same AOA cluster and AOA
communities in NAS2 and NAS10 were more diverse than those of
NAS30. In contrast to AOA, AOB communities obviously shifted
from the seed sludge to enriched NASs and in each enriched NAS,
communities of AOB varied particularly. The seed sludge contained
members of N. communis cluster and N. oligotropha cluster. After it
was enriched under various ammonium loads, members of N.
communis cluster disappeared from all enriched NASs. AOB with
high affinity to ammonia presented in NAS 2, AOB with low affinity
to ammonia presented in NAS 30, and both types of AOB survived in
NAS 10. These demonstrated that ammonium load significantly
influenced AOB communities, but not AOA communities in enriched
NASs.
Abstract: The tracking allows to detect the tumor affections of cervical cancer, it is particularly complex and consuming time, because it consists in seeking some abnormal cells among a cluster of normal cells. In this paper, we present our proposed computer system for helping the doctors in tracking the cervical cancer. Knowing that the diagnosis of the malignancy is based in the set of atypical morphological details of all cells, herein, we present an unsupervised genetic algorithm for the separation of cell components since the diagnosis is doing by analysis of the core and the cytoplasm. We give also the various algorithms used for computing the morphological characteristics of cells (Ratio core/cytoplasm, cellular deformity, ...) necessary for the recognition of illness.
Abstract: Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.
Abstract: Signature represents an individual characteristic of a
person which can be used for his / her validation. For such application
proper modeling is essential. Here we propose an offline signature
recognition and verification scheme which is based on extraction of
several features including one hybrid set from the input signature
and compare them with the already trained forms. Feature points
are classified using statistical parameters like mean and variance.
The scanned signature is normalized in slant using a very simple
algorithm with an intention to make the system robust which is
found to be very helpful. The slant correction is further aided by the
use of an Artificial Neural Network (ANN). The suggested scheme
discriminates between originals and forged signatures from simple
and random forgeries. The primary objective is to reduce the two
crucial parameters-False Acceptance Rate (FAR) and False Rejection
Rate (FRR) with lesser training time with an intension to make the
system dynamic using a cluster of ANNs forming a multiple classifier
system.
Abstract: Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.
Abstract: A model to identify the lifetime of target tracking
wireless sensor network is proposed. The model is a static clusterbased
architecture and aims to provide two factors. First, it is to
increase the lifetime of target tracking wireless sensor network.
Secondly, it is to enable good localization result with low energy
consumption for each sensor in the network. The model consists of
heterogeneous sensors and each sensing member node in a cluster
uses two operation modes–active mode and sleep mode. The
performance results illustrate that the proposed architecture consumes
less energy and increases lifetime than centralized and dynamic
clustering architectures, for target tracking sensor network.
Abstract: Water quality and freshwater fish diversity from nine
waterfalls at Khao Luang National Park, Thailand was examined.
Streams were shallow, fast flowing with clear water and rocky and
sandy substrate. The mean water quality of waterfalls at Khao Luang
National Park were as following pH 7.50, air temperature 24.27 °C,
water temperature 26.37 °C, dissolved oxygen 7.88 mg/l, hardness
4.44-21.33 mg/l, alkalinity 3.55-11.88 mg/(as CaCO3). Twenty fish
species were found at Khao Luang National Park belonging to nine
families. A cluster analysis of water quality at Khao Luang National
Park revealed that waterfalls at Khao Luang National Park were
divided into two groups: A and B. Group A composed of two
waterfalls (i.e. Aie Kaew and Wangmaipak) that flew to the Gulf of
Thailand side. Group B composed of seven waterfalls (i.e. Promlok,
Kalom, Nuafa, Suankun, Soidaw, Suanhai, and Thapae) that flew to
the Andaman Sea side (Fig. 2) .The Cyprinids represented the major
species in all the waterfalls comprising of 45%.
Abstract: Data clustering is an important data exploration technique
with many applications in data mining. We present an enhanced
version of the well known single link clustering algorithm. We will
refer to this algorithm as DCBOR. The proposed algorithm alleviates
the chain effect by removing the outliers from the given dataset.
So this algorithm provides outlier detection and data clustering
simultaneously. This algorithm does not need to update the distance
matrix, since the algorithm depends on merging the most k-nearest
objects in one step and the cluster continues grow as long as possible
under specified condition. So the algorithm consists of two phases;
at the first phase, it removes the outliers from the input dataset. At
the second phase, it performs the clustering process. This algorithm
discovers clusters of different shapes, sizes, densities and requires
only one input parameter; this parameter represents a threshold for
outlier points. The value of the input parameter is ranging from 0 to
1. The algorithm supports the user in determining an appropriate
value for it. We have tested this algorithm on different datasets
contain outlier and connecting clusters by chain of density points,
and the algorithm discovers the correct clusters. The results of
our experiments demonstrate the effectiveness and the efficiency of
DCBOR.
Abstract: A network of coupled stochastic oscillators is
proposed for modeling of a cluster of entangled qubits that is
exploited as a computation resource in one-way quantum
computation schemes. A qubit model has been designed as a
stochastic oscillator formed by a pair of coupled limit cycle
oscillators with chaotically modulated limit cycle radii and
frequencies. The qubit simulates the behavior of electric field of
polarized light beam and adequately imitates the states of two-level
quantum system. A cluster of entangled qubits can be associated
with a beam of polarized light, light polarization degree being
directly related to cluster entanglement degree. Oscillatory network,
imitating qubit cluster, is designed, and system of equations for
network dynamics has been written. The constructions of one-qubit
gates are suggested. Changing of cluster entanglement degree caused
by measurements can be exactly calculated.
Abstract: In this paper, Land Marks for Unique Addressing( LMUA) algorithm is develped to generate unique ID for each and every node which leads to the formation of overlapping/Non overlapping clusters based on unique ID. To overcome the draw back of the developed LMUA algorithm, the concept of clustering is introduced. Based on the clustering concept a Land Marks for Unique Addressing and Clustering(LMUAC) Algorithm is developed to construct strictly non-overlapping clusters and classify those nodes in to Cluster Heads, Member Nodes, Gate way nodes and generating the Hierarchical code for the cluster heads to operate in the level one hierarchy for wireless communication switching. The expansion of the existing network can be performed or not without modifying the cost of adding the clusterhead is shown. The developed algorithm shows one way of efficiently constructing the
Abstract: Currently, one of the main directions is developing of
development based on the clustering of economic operations of
Kazakhstan, providing for the organization and concentration of
production capacity in one region or the most optimal system. In the
modern economic literature clustering is regarded as one of the most
effective tools to ensure competitive businesses, and improve their
business itself.