Abstract: XML data consists of a very flexible tree-structure
which makes it difficult to support the storing and retrieving of XML
data. The node numbering scheme is one of the most popular
approaches to store XML in relational databases. Together with the
node numbering storage scheme, structural joins can be used to
efficiently process the hierarchical relationships in XML. However, in
order to process a tree-structured XPath query containing several
hierarchical relationships and conditional sentences on XML data,
many structural joins need to be carried out, which results in a high
query execution cost. This paper introduces mechanisms to reduce the
XPath queries including branch nodes into a much more efficient form
with less numbers of structural joins. A two step approach is proposed.
The first step merges duplicate nodes in the tree-structured query and
the second step divides the query into sub-queries, shortens the paths
and then merges the sub-queries back together. The proposed
approach can highly contribute to the efficient execution of XML
queries. Experimental results show that the proposed scheme can
reduce the query execution cost by up to an order of magnitude of the
original execution cost.
Abstract: Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Abstract: Bio-chips are used for experiments on genes and
contain various information such as genes, samples and so on. The
two-dimensional bio-chips, in which one axis represent genes and the
other represent samples, are widely being used these days. Instead of
experimenting with real genes which cost lots of money and much
time to get the results, bio-chips are being used for biological
experiments. And extracting data from the bio-chips with high
accuracy and finding out the patterns or useful information from such
data is very important. Bio-chip analysis systems extract data from
various kinds of bio-chips and mine the data in order to get useful
information. One of the commonly used methods to mine the data is
classification. The algorithm that is used to classify the data can be
various depending on the data types or number characteristics and so
on. Considering that bio-chip data is extremely large, an algorithm that
imitates the ecosystem such as the ant algorithm is suitable to use as an
algorithm for classification. This paper focuses on finding the
classification rules from the bio-chip data using the Ant Colony
algorithm which imitates the ecosystem. The developed system takes
in consideration the accuracy of the discovered rules when it applies it
to the bio-chip data in order to predict the classes.