Abstract: Wavelet neural networks (WNNs) have emerged as a vital alternative to the vastly studied multilayer perceptrons (MLPs) since its first implementation. In this paper, we applied various clustering algorithms, namely, K-means (KM), Fuzzy C-means (FCM), symmetry-based K-means (SBKM), symmetry-based Fuzzy C-means (SBFCM) and modified point symmetry-based K-means (MPKM) clustering algorithms in choosing the translation parameter of a WNN. These modified WNNs are further applied to the heterogeneous cancer classification using benchmark microarray data and were compared against the conventional WNN with random initialization method. Experimental results showed that a WNN classifier with the MPKM algorithm is more precise than the conventional WNN as well as the WNNs with other clustering algorithms.
Abstract: Understanding the cell's large-scale organization is an interesting task in computational biology. Thus, protein-protein interactions can reveal important organization and function of the cell. Here, we investigated the correspondence between protein interactions and function for the yeast. We obtained the correlations among the set of proteins. Then these correlations are clustered using both the hierarchical and biclustering methods. The detailed analyses of proteins in each cluster were carried out by making use of their functional annotations. As a result, we found that some functional classes appear together in almost all biclusters. On the other hand, in hierarchical clustering, the dominancy of one functional class is observed. In the light of the clustering data, we have verified some interactions which were not identified as core interactions in DIP and also, we have characterized some functionally unknown proteins according to the interaction data and functional correlation. In brief, from interaction data to function, some correlated results are noticed about the relationship between interaction and function which might give clues about the organization of the proteins, also to predict new interactions and to characterize functions of unknown proteins.
Abstract: The study on the tree growth for four species groups of commercial timber in Koh Kong province, Cambodia-s tropical rainforest is described. The simulation for these four groups had been successfully developed in the 5-year interval through year-60. Data were obtained from twenty permanent sample plots in the duration of thirteen years. The aim for this study was to develop stand table simulation system of tree growth by the species group. There were five steps involved in the development of the tree growth simulation: aggregate the tree species into meaningful groups by using cluster analysis; allocate the trees in the diameter classes by the species group; observe the diameter movement of the species group. The diameter growth rate, mortality rate and recruitment rate were calculated by using some mathematical formula. Simulation equation had been created by combining those parameters. Result showed the dissimilarity of the diameter growth among species groups.
Abstract: Artificial Bee Colony (ABC) algorithm is a relatively new swarm intelligence technique for clustering. It produces higher
quality clusters compared to other population-based algorithms but with poor energy efficiency, cluster quality consistency and typically slower in convergence speed. Inspired by energy saving foraging behavior of natural honey bees this paper presents a Quality and Quantity Aware Artificial Bee Colony (Q2ABC) algorithm to improve quality of cluster identification, energy efficiency and convergence speed of the original ABC. To evaluate the performance of Q2ABC algorithm, experiments were conducted on a suite of ten benchmark UCI datasets. The results demonstrate Q2ABC outperformed ABC and K-means algorithm in the quality of clusters delivered.
Abstract: Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced.
Abstract: This paper describes a probabilistic method for
three-dimensional object recognition using a shared pool of surface
signatures. This technique uses flatness, orientation, and convexity
signatures that encode the surface of a free-form object into three
discriminative vectors, and then creates a shared pool of data by
clustering the signatures using a distance function. This method
applies the Bayes-s rule for recognition process, and it is extensible
to a large collection of three-dimensional objects.
Abstract: As a structure for processing string problem, suffix
array is certainly widely-known and extensively-studied. But if the
string access pattern follows the “90/10" rule, suffix array can not take
advantage of the fact that we often find something that we have just
found. Although the splay tree is an efficient data structure for small
documents when the access pattern follows the “90/10" rule, it
requires many structures and an excessive amount of pointer
manipulations for efficiently processing and searching large
documents. In this paper, we propose a new and conceptually powerful
data structure, called splay suffix arrays (SSA), for string search. This
data structure combines the features of splay tree and suffix arrays into
a new approach which is suitable to implementation on both
conventional and clustered computers.
Abstract: In this paper, a model for an information retrieval
system is proposed which takes into account that knowledge about
documents and information need of users are dynamic. Two
methods are combined, one qualitative or symbolic and the other
quantitative or numeric, which are deemed suitable for many
clustering contexts, data analysis, concept exploring and
knowledge discovery. These two methods may be classified as
inductive learning techniques. In this model, they are introduced to
build “long term" knowledge about past queries and concepts in a
collection of documents. The “long term" knowledge can guide
and assist the user to formulate an initial query and can be
exploited in the process of retrieving relevant information. The
different kinds of knowledge are organized in different points of
view. This may be considered an enrichment of the exploration
level which is coherent with the concept of document/query
structure.
Abstract: The phylogenetic analysis using the most conservative
portions of 18S rRNA gene revealed the phylogenetic relationship
among the two populations where DNA divergence showed that the
nucleotides diversity value were -0.00838 for the Tanjung Dawai,
Kedah and -0.00708 for the Cherating, Pahang populations
respectively. The net nucleotide divergence among populations (Da)
was -0.0073 indicating a low polymorphism among the populations
studied. Total number of mutations in the Tanjung Dawai, Kedah
samples was higher than Cherating, Pahang samples, which are 73 and
59 respectively while shared mutations across the populations were 8,
and reveal the evolutionary in the genome of Malaysian T. gigas. The
tree topology of both populations inferred using Neigbour-joining
method by comparing 1791 bp of partial 18S rRNA sequence revealed
that T. gigas haplotypes were clustered into seven clades, suggesting
that they are genetically diverse among populations which derived
from a common ancestor.
Abstract: The literature has argued that firms based in industrial districts enjoy advantages for creating internal knowledge and absorbing external knowledge as a consequence of to the knowledge flows and spillovers that exist in the district. However, empirical evidence to show how belonging to an industrial district affects the business processes of creation and absorption of knowledge is scarce and, moreover, empirical research has not taken into account the influence of variations in the flows of knowledge circulating in each cluster. This study aims to extend empirical evidence on the effect that the stock of shared competencies in industrial districts has on the business processes of creation and absorption of knowledge, through data from an initial study on 952 firms and 35 industrial districts in Spain.
Abstract: This paper focuses on reducing the power consumption
of wireless sensor networks. Therefore, a communication protocol
named LEACH (Low-Energy Adaptive Clustering Hierarchy) is modified.
We extend LEACHs stochastic cluster-head selection algorithm
by a modifying the probability of each node to become cluster-head
based on its required energy to transmit to the sink. We present
an efficient energy aware routing algorithm for the wireless sensor
networks. Our contribution consists in rotation selection of clusterheads
considering the remoteness of the nodes to the sink, and then,
the network nodes residual energy. This choice allows a best distribution
of the transmission energy in the network. The cluster-heads
selection algorithm is completely decentralized. Simulation results
show that the energy is significantly reduced compared with the
previous clustering based routing algorithm for the sensor networks.
Abstract: The aim of this work was to detect genetic variability among the set of 40 castor genotypes using 8 RAPD markers. Amplification of genomic DNA of 40 genotypes, using RAPD analysis, yielded in 66 fragments, with an average of 8.25 polymorphic fragments per primer. Number of amplified fragments ranged from 3 to 13, with the size of amplicons ranging from 100 to 1200 bp. Values of the polymorphic information content (PIC) value ranged from 0.556 to 0.895 with an average of 0.784 and diversity index (DI) value ranged from 0.621 to 0.896 with an average of 0.798. The dendrogram based on hierarchical cluster analysis using UPGMA algorithm was prepared and analyzed genotypes were grouped into two main clusters and only two genotypes could not be distinguished. Knowledge on the genetic diversity of castor can be used for future breeding programs for increased oil production for industrial uses.
Abstract: Due to heavy energy constraints in WSNs clustering is
an efficient way to manage the energy in sensors. There are many
methods already proposed in the area of clustering and research is
still going on to make clustering more energy efficient. In our paper
we are proposing a minimum spanning tree based clustering using
divide and conquer approach. The MST based clustering was first
proposed in 1970’s for large databases. Here we are taking divide and
conquer approach and implementing it for wireless sensor networks
with the constraints attached to the sensor networks. This Divide and
conquer approach is implemented in a way that we don’t have to
construct the whole MST before clustering but we just find the edge
which will be the part of the MST to a corresponding graph and
divide the graph in clusters there itself if that edge from the graph can
be removed judging on certain constraints and hence saving lot of
computation.
Abstract: The major challenge faced by wireless sensor networks is security. Because of dynamic and collaborative nature of sensor networks the connected sensor devices makes the network unusable. To solve this issue, a trust model is required to find malicious, selfish and compromised insiders by evaluating trust worthiness sensors from the network. It supports the decision making processes in wireless sensor networks such as pre key-distribution, cluster head selection, data aggregation, routing and self reconfiguration of sensor nodes. This paper discussed the kinds of trust model, trust metrics used to address attacks by monitoring certain behavior of network. It describes the major design issues and their countermeasures of building trust model. It also discusses existing trust models used in various decision making process of wireless sensor networks.
Abstract: A mobile ad hoc network is a network of mobile nodes
without any notion of centralized administration. In such a network,
each mobile node behaves not only as a host which runs applications
but also as a router to forward packets on behalf of others. Clustering
has been applied to routing protocols to achieve efficient
communications. A CH network expresses the connected relationship
among cluster-heads. This paper discusses the methods for
constructing a CH network, and produces the following results: (1)
The required running costs of 3 traditional methods for constructing a
CH network are not so different from each other in the static
circumstance, or in the dynamic circumstance. Their running costs in
the static circumstance do not differ from their costs in the dynamic
circumstance. Meanwhile, although the routing costs required for the
above 3 methods are not so different in the static circumstance, the
costs are considerably different from each other in the dynamic
circumstance. Their routing costs in the static circumstance are also
very different from their costs in the dynamic circumstance, and the
former is one tenths of the latter. The routing cost in the dynamic
circumstance is mostly the cost for re-routing. (2) On the strength of
the above results, we discuss new 2 methods regarding whether they
are tolerable or not in the dynamic circumstance, that is, whether the
times of re-routing are small or not. These new methods are revised
methods that are based on the traditional methods. We recommended
the method which produces the smallest routing cost in the dynamic
circumstance, therefore producing the smallest total cost.
Abstract: The purpose of this work is to establish the theoretical
foundations for calculating and designing the sublimationcondensation
processes in chemical apparatuses which are intended
for production of ultrafine powders of crystalline and amorphous
materials with controlled fractional composition. Theoretic analysis
of the primary processes of nucleation and growth kinetics of the
clusters according to the degree of super-saturation and the
homogeneous or heterogeneous nature of nucleation has been carried
out. The engineering design procedures of desublimation processes
have been offered and tested for modification of the Claus process.
Abstract: The minimal condition for symmetry breaking in morphogenesis of cellular population was investigated using cellular automata based on reaction-diffusion dynamics. In particular, the study looked for the possibility of the emergence of branching structures due to mechanical interactions. The model used two types of cells an external gradient. The results showed that the external gradient influenced movement of cell type-I, also revealed that clusters formed by cells type-II worked as barrier to movement of cells type-I.
Abstract: Microarray data profiles gene expression on a whole
genome scale, therefore, it provides a good way to study associations
between gene expression and occurrence or progression of cancer.
More and more researchers realized that microarray data is helpful
to predict cancer sample. However, the high dimension of gene
expressions is much larger than the sample size, which makes this
task very difficult. Therefore, how to identify the significant genes
causing cancer becomes emergency and also a hot and hard research
topic. Many feature selection algorithms have been proposed in
the past focusing on improving cancer predictive accuracy at the
expense of ignoring the correlations between the features. In this
work, a novel framework (named by SGS) is presented for stable gene
selection and efficient cancer prediction . The proposed framework
first performs clustering algorithm to find the gene groups where
genes in each group have higher correlation coefficient, and then
selects the significant genes in each group with Bayesian Lasso and
important gene groups with group Lasso, and finally builds prediction
model based on the shrinkage gene space with efficient classification
algorithm (such as, SVM, 1NN, Regression and etc.). Experiment
results on real world data show that the proposed framework often
outperforms the existing feature selection and prediction methods,
say SAM, IG and Lasso-type prediction model.
Abstract: In this paper we use exponential particle swarm
optimization (EPSO) to cluster data. Then we compare between
(EPSO) clustering algorithm which depends on exponential variation
for the inertia weight and particle swarm optimization (PSO)
clustering algorithm which depends on linear inertia weight. This
comparison is evaluated on five data sets. The experimental results
show that EPSO clustering algorithm increases the possibility to find
the optimal positions as it decrease the number of failure. Also show
that (EPSO) clustering algorithm has a smaller quantization error
than (PSO) clustering algorithm, i.e. (EPSO) clustering algorithm
more accurate than (PSO) clustering algorithm.
Abstract: Although backpropagation ANNs generally predict
better than decision trees do for pattern classification problems, they
are often regarded as black boxes, i.e., their predictions cannot be
explained as those of decision trees. In many applications, it is
desirable to extract knowledge from trained ANNs for the users to
gain a better understanding of how the networks solve the problems.
A new rule extraction algorithm, called rule extraction from artificial
neural networks (REANN) is proposed and implemented to extract
symbolic rules from ANNs. A standard three-layer feedforward ANN
is the basis of the algorithm. A four-phase training algorithm is
proposed for backpropagation learning. Explicitness of the extracted
rules is supported by comparing them to the symbolic rules generated
by other methods. Extracted rules are comparable with other methods
in terms of number of rules, average number of conditions for a rule,
and predictive accuracy. Extensive experimental studies on several
benchmarks classification problems, such as breast cancer, iris,
diabetes, and season classification problems, demonstrate the
effectiveness of the proposed approach with good generalization
ability.