An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data

Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.





References:
[1] DJ Lockhart, HL Dong, MC Byrne, MT Follettie, MV Gallo, MS Chee,
M Mittmann, CW Wang, M Kobayashi, H Horton,EL Brown, Expression
monitoring by hybridization to high-density oligonucleotide arrays,
Nature Biotechnology, 14(13):1675-1680, 1996.
[2] JL DeRisi, VR Iver, PO Brown, Exploring the metabolic and genetic
control of gene expression on a genomic scale, Science,
278(5338):680-686, 1997.
[3] C Debouck, PN Goodfellow, DNA microarrays in drug discovery and
development", Nature Genetics, 21(1 suppl):48-50, 1999.
[4] David Bowtell, Joseph Sambrook, DNA Microarrays, CSHL Press, 2002
[5] WIKIPEDIA, http://en.wikipedia.org/wiki/Ant_colony_optimization
[6] Michael B. Eisen, Paul T. Spellman, Patrick O. Browndagger, and David
Botstein, Cluster analysis and display of genome-wide expression
patterns, Proceedings of the National Academy of Sciences of the United
States of America (PNAS), 95:25, 1998.
[7] G. Sherlock, Analysis of large-scale gene expression data, Brief
Bioinform. vol. 2, pp.350-362, 2001.
[8] P Toronen, M Kolehmainen, G Wong, E Castren , Analysis of gene
expression data using self-organizing maps, FEBS Letters,
451(2):142-146, 1999.
[9] DNA chip, http://mbel.kaist.ac.kr/research/DNAchip_en.html.
[10] WIKIPEDIA, http://en.wikipedia.org/wiki/Genetic_algorithm.
[11] Aleksander I. and Morton H., An introduction to neural computing, 2nd
edition.
[12] Particle Swarm Optimization Homepage, http://www.cis.syr.edu/
~mohan/pso/.
[13] WIKIPEDIA, http://en.wikipedia.org/wiki/Ant_colony_optimization.
[14] Peng Yuqing, Hou Xiangdan, Liu Shang, The K-means Clustering
Algorithm based on Density and Ant colony, IEEE Int. Conf. Neural
Networks & Signal Processing Nanjing, China, December 14-17, 2003.
[15] Xiang Xiao, Ernst R. Dow, Russell Eberhart, Zina Ben Miled, Robert J.
Oppelt, Gene Clustering Using Self-Organizing Maps and Particle
Swarm Optimization, IEEE International Workshop On High
Performance Computational Biology, 2003.
[16] Julia Handl, Joshua Knowles, Marco Dorigo, Ant-Based Clustering: A
Comparative Study of its relative performance with respect to k-means,
average link and 1D-SOM, IRIDIA-Technical Report Series, 2003.
[17] T.Kohonen, Self-Organizing Maps, Springer-Verlag, Berlin, Germany,
1995.