Abstract: In order to accelerate the similarity search in highdimensional database, we propose a new hierarchical indexing method. It is composed of offline and online phases. Our contribution concerns both phases. In the offline phase, after gathering the whole of the data in clusters and constructing a hierarchical index, the main originality of our contribution consists to develop a method to construct bounding forms of clusters to avoid overlapping. For the online phase, our idea improves considerably performances of similarity search. However, for this second phase, we have also developed an adapted search algorithm. Our method baptized NOHIS (Non-Overlapping Hierarchical Index Structure) use the Principal Direction Divisive Partitioning (PDDP) as algorithm of clustering. The principle of the PDDP is to divide data recursively into two sub-clusters; division is done by using the hyper-plane orthogonal to the principal direction derived from the covariance matrix and passing through the centroid of the cluster to divide. Data of each two sub-clusters obtained are including by a minimum bounding rectangle (MBR). The two MBRs are directed according to the principal direction. Consequently, the nonoverlapping between the two forms is assured. Experiments use databases containing image descriptors. Results show that the proposed method outperforms sequential scan and SRtree in processing k-nearest neighbors.
Abstract: In this paper, Land Marks for Unique Addressing( LMUA) algorithm is develped to generate unique ID for each and every node which leads to the formation of overlapping/Non overlapping clusters based on unique ID. To overcome the draw back of the developed LMUA algorithm, the concept of clustering is introduced. Based on the clustering concept a Land Marks for Unique Addressing and Clustering(LMUAC) Algorithm is developed to construct strictly non-overlapping clusters and classify those nodes in to Cluster Heads, Member Nodes, Gate way nodes and generating the Hierarchical code for the cluster heads to operate in the level one hierarchy for wireless communication switching. The expansion of the existing network can be performed or not without modifying the cost of adding the clusterhead is shown. The developed algorithm shows one way of efficiently constructing the
Abstract: Biclustering aims at identifying several biclusters that
reveal potential local patterns from a microarray matrix. A bicluster is
a sub-matrix of the microarray consisting of only a subset of genes
co-regulates in a subset of conditions. In this study, we extend the
motif of subspace clustering to present a K-biclusters clustering (KBC)
algorithm for the microarray biclustering issue. Besides minimizing
the dissimilarities between genes and bicluster centers within all
biclusters, the objective function of the KBC algorithm additionally
takes into account how to minimize the residues within all biclusters
based on the mean square residue model. In addition, the objective
function also maximizes the entropy of conditions to stimulate more
conditions to contribute the identification of biclusters. The KBC
algorithm adopts the K-means type clustering process to efficiently
make the partition of K biclusters be optimized. A set of experiments
on a practical microarray dataset are demonstrated to show the
performance of the proposed KBC algorithm.
Abstract: In practice, wireless networks has the property that
the signal strength attenuates with respect to the distance from the
base station, it could be better if the nodes at two hop away are
considered for better quality of service. In this paper, we propose a
procedure to identify delay preserving substructures for a given
wireless ad-hoc network using a new graph operation G 2 – E (G) =
G* (Edge difference of square graph of a given graph and the
original graph). This operation helps to analyze some induced
substructures, which preserve delay in communication among them.
This operation G* on a given graph will induce a graph, in which 1-
hop neighbors of any node are at 2-hop distance in the original
network. In this paper, we also identify some delay preserving
substructures in G*, which are (i) set of all nodes, which are mutually
at 2-hop distance in G that will form a clique in G*, (ii) set of nodes
which forms an odd cycle C2k+1 in G, will form an odd cycle in G*
and the set of nodes which form a even cycle C2k in G that will form
two disjoint companion cycles ( of same parity odd/even) of length k
in G*, (iii) every path of length 2k+1 or 2k in G will induce two
disjoint paths of length k in G*, and (iv) set of nodes in G*, which
induces a maximal connected sub graph with radius 1 (which
identifies a substructure with radius equal 2 and diameter at most 4 in
G). The above delay preserving sub structures will behave as good
clusters in the original network.
Abstract: Understanding the cell's large-scale organization is an interesting task in computational biology. Thus, protein-protein interactions can reveal important organization and function of the cell. Here, we investigated the correspondence between protein interactions and function for the yeast. We obtained the correlations among the set of proteins. Then these correlations are clustered using both the hierarchical and biclustering methods. The detailed analyses of proteins in each cluster were carried out by making use of their functional annotations. As a result, we found that some functional classes appear together in almost all biclusters. On the other hand, in hierarchical clustering, the dominancy of one functional class is observed. In the light of the clustering data, we have verified some interactions which were not identified as core interactions in DIP and also, we have characterized some functionally unknown proteins according to the interaction data and functional correlation. In brief, from interaction data to function, some correlated results are noticed about the relationship between interaction and function which might give clues about the organization of the proteins, also to predict new interactions and to characterize functions of unknown proteins.
Abstract: Artificial Bee Colony (ABC) algorithm is a relatively new swarm intelligence technique for clustering. It produces higher
quality clusters compared to other population-based algorithms but with poor energy efficiency, cluster quality consistency and typically slower in convergence speed. Inspired by energy saving foraging behavior of natural honey bees this paper presents a Quality and Quantity Aware Artificial Bee Colony (Q2ABC) algorithm to improve quality of cluster identification, energy efficiency and convergence speed of the original ABC. To evaluate the performance of Q2ABC algorithm, experiments were conducted on a suite of ten benchmark UCI datasets. The results demonstrate Q2ABC outperformed ABC and K-means algorithm in the quality of clusters delivered.
Abstract: The aim of this work was to detect genetic variability among the set of 40 castor genotypes using 8 RAPD markers. Amplification of genomic DNA of 40 genotypes, using RAPD analysis, yielded in 66 fragments, with an average of 8.25 polymorphic fragments per primer. Number of amplified fragments ranged from 3 to 13, with the size of amplicons ranging from 100 to 1200 bp. Values of the polymorphic information content (PIC) value ranged from 0.556 to 0.895 with an average of 0.784 and diversity index (DI) value ranged from 0.621 to 0.896 with an average of 0.798. The dendrogram based on hierarchical cluster analysis using UPGMA algorithm was prepared and analyzed genotypes were grouped into two main clusters and only two genotypes could not be distinguished. Knowledge on the genetic diversity of castor can be used for future breeding programs for increased oil production for industrial uses.
Abstract: Due to heavy energy constraints in WSNs clustering is
an efficient way to manage the energy in sensors. There are many
methods already proposed in the area of clustering and research is
still going on to make clustering more energy efficient. In our paper
we are proposing a minimum spanning tree based clustering using
divide and conquer approach. The MST based clustering was first
proposed in 1970’s for large databases. Here we are taking divide and
conquer approach and implementing it for wireless sensor networks
with the constraints attached to the sensor networks. This Divide and
conquer approach is implemented in a way that we don’t have to
construct the whole MST before clustering but we just find the edge
which will be the part of the MST to a corresponding graph and
divide the graph in clusters there itself if that edge from the graph can
be removed judging on certain constraints and hence saving lot of
computation.
Abstract: The purpose of this work is to establish the theoretical
foundations for calculating and designing the sublimationcondensation
processes in chemical apparatuses which are intended
for production of ultrafine powders of crystalline and amorphous
materials with controlled fractional composition. Theoretic analysis
of the primary processes of nucleation and growth kinetics of the
clusters according to the degree of super-saturation and the
homogeneous or heterogeneous nature of nucleation has been carried
out. The engineering design procedures of desublimation processes
have been offered and tested for modification of the Claus process.
Abstract: The minimal condition for symmetry breaking in morphogenesis of cellular population was investigated using cellular automata based on reaction-diffusion dynamics. In particular, the study looked for the possibility of the emergence of branching structures due to mechanical interactions. The model used two types of cells an external gradient. The results showed that the external gradient influenced movement of cell type-I, also revealed that clusters formed by cells type-II worked as barrier to movement of cells type-I.
Abstract: Modern times call organizations to have an active role
in the social arena, through Corporate Social Responsibility (CSR).
The objective of this research was to test the hypothesis that there is a
positive relation between social performance and economic
performance, and if there is a positive correlation between social
performance and financial-economic performance. To test these
theories a measure of social performance, based on the Green Book
of Commission of the European Community, was used in a group of
nineteen Portuguese top companies, listed on the PSI 20 index,
through a period of five years, since 2005 to 2009. A clusters
analysis was applied to group companies by their social performance
and to compare and correlate their economic performance. Results
indicate that companies that had a better social performance are not
the ones who had a better economic performance, and suggest that
the middle path might provide a good relation CSR-Economic
performance, as a basis to a sustainable development.
Abstract: K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.
Abstract: A new blind symbol by symbol equalizer is proposed.
The operation of the proposed equalizer is based on the geometric
properties of the two dimensional data constellation. An unsupervised
clustering technique is used to locate the clusters formed by the
received data. The symmetric properties of the clusters labels are
subsequently utilized in order to label the clusters. Following this
step, the received data are compared to clusters and decisions are
made on a symbol by symbol basis, by assigning to each data
the label of the nearest cluster. The operation of the equalizer is
investigated both in linear and nonlinear channels. The performance
of the proposed equalizer is compared to the performance of a CMAbased
blind equalizer.
Abstract: Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Abstract: The competitive learning is an adaptive process in
which the neurons in a neural network gradually become sensitive to
different input pattern clusters. The basic idea behind the Kohonen-s
Self-Organizing Feature Maps (SOFM) is competitive learning.
SOFM can generate mappings from high-dimensional signal spaces
to lower dimensional topological structures. The main features of this
kind of mappings are topology preserving, feature mappings and
probability distribution approximation of input patterns. To overcome
some limitations of SOFM, e.g., a fixed number of neural units and a
topology of fixed dimensionality, Growing Self-Organizing Neural
Network (GSONN) can be used. GSONN can change its topological
structure during learning. It grows by learning and shrinks by
forgetting. To speed up the training and convergence, a new variant
of GSONN, twin growing cell structures (TGCS) is presented here.
This paper first gives an introduction to competitive learning, SOFM
and its variants. Then, we discuss some GSONN with fixed
dimensionality, which include growing cell structures, its variants
and the author-s model: TGCS. It is ended with some testing results
comparison and conclusions.
Abstract: Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.
Abstract: This paper aims at identifying and analyzing the
knowledge transmission channels in textile and clothing clusters
located in Brazil and in Europe. Primary data was obtained through
interviews with key individuals. The collection of primary data was
carried out based on a questionnaire with ten categories of indicators
of knowledge transmission. Secondary data was also collected
through a literature review and through international organizations
sites. Similarities related to the use of the main transmission channels
of knowledge are observed in all cases. The main similarities are:
influence of suppliers of machinery, equipment and raw materials;
imitation of products and best practices; training promoted by
technical institutions and businesses; and cluster companies being
open to acquire new knowledge. The main differences lie in the
relationship between companies, where in Europe the intensity of this
relationship is bigger when compared to Brazil. The differences also
occur in importance and frequency of the relationship with the
government, with the cultural environment, and with the activities of
research and development. It is also found factors that reduce the
importance of geographical proximity in transmission of knowledge,
and in generating trust and the establishment of collaborative
behavior.
Abstract: Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.
Abstract: In this paper, we present a new algorithm for clustering data in large datasets using image processing approaches. First the dataset is mapped into a binary image plane. The synthesized image is then processed utilizing efficient image processing techniques to cluster the data in the dataset. Henceforth, the algorithm avoids exhaustive search to identify clusters. The algorithm considers only a small set of the data that contains critical boundary information sufficient to identify contained clusters. Compared to available data clustering techniques, the proposed algorithm produces similar quality results and outperforms them in execution time and storage requirements.
Abstract: Grid computing is a group of clusters connected over
high-speed networks that involves coordinating and sharing
computational power, data storage and network resources operating
across dynamic and geographically dispersed locations. Resource
management and job scheduling are critical tasks in grid computing.
Resource selection becomes challenging due to heterogeneity and
dynamic availability of resources. Job scheduling is a NP-complete
problem and different heuristics may be used to reach an optimal or
near optimal solution. This paper proposes a model for resource and
job scheduling in dynamic grid environment. The main focus is to
maximize the resource utilization and minimize processing time of
jobs. Grid resource selection strategy is based on Max Heap Tree
(MHT) that best suits for large scale application and root node of
MHT is selected for job submission. Job grouping concept is used to
maximize resource utilization for scheduling of jobs in grid
computing. Proposed resource selection model and job grouping
concept are used to enhance scalability, robustness, efficiency and
load balancing ability of the grid.