A Two-Step Approach for Tree-structured XPath Query Reduction

XML data consists of a very flexible tree-structure which makes it difficult to support the storing and retrieving of XML data. The node numbering scheme is one of the most popular approaches to store XML in relational databases. Together with the node numbering storage scheme, structural joins can be used to efficiently process the hierarchical relationships in XML. However, in order to process a tree-structured XPath query containing several hierarchical relationships and conditional sentences on XML data, many structural joins need to be carried out, which results in a high query execution cost. This paper introduces mechanisms to reduce the XPath queries including branch nodes into a much more efficient form with less numbers of structural joins. A two step approach is proposed. The first step merges duplicate nodes in the tree-structured query and the second step divides the query into sub-queries, shortens the paths and then merges the sub-queries back together. The proposed approach can highly contribute to the efficient execution of XML queries. Experimental results show that the proposed scheme can reduce the query execution cost by up to an order of magnitude of the original execution cost.

An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data

Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.

Classifying Bio-Chip Data using an Ant Colony System Algorithm

Bio-chips are used for experiments on genes and contain various information such as genes, samples and so on. The two-dimensional bio-chips, in which one axis represent genes and the other represent samples, are widely being used these days. Instead of experimenting with real genes which cost lots of money and much time to get the results, bio-chips are being used for biological experiments. And extracting data from the bio-chips with high accuracy and finding out the patterns or useful information from such data is very important. Bio-chip analysis systems extract data from various kinds of bio-chips and mine the data in order to get useful information. One of the commonly used methods to mine the data is classification. The algorithm that is used to classify the data can be various depending on the data types or number characteristics and so on. Considering that bio-chip data is extremely large, an algorithm that imitates the ecosystem such as the ant algorithm is suitable to use as an algorithm for classification. This paper focuses on finding the classification rules from the bio-chip data using the Ant Colony algorithm which imitates the ecosystem. The developed system takes in consideration the accuracy of the discovered rules when it applies it to the bio-chip data in order to predict the classes.