Automatic Clustering of Gene Ontology by Genetic Algorithm

Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.





References:
[1] The Gene Ontology Consortium, "Gene ontology: tool for the
unification of biology," Nat. Genet., vol. 25, no. 1, pp. 25-29, May 2000.
[2] M. Kaneko, M. Miki, and T. Hiroyasu, "A parallel genetic algorithm
with distributed environment scheme," in Proc. 4th. Int. Conf. Parallel
& Distributed Processing Techniques & Applications, Las Vegas, NV,
2000, pp. 619-625.
[3] S. Dutt and W. Deng, "Probability-based approaches to VLSI circuit
partitioning," IEEE Trans. Computer-Aided Design of Integrated
Circuits & Systems, vol. 19, no. 5, pp. 534-549, May 2000.
[4] C. Walshaw and M. Cross, "Parallel optimisation algorithms for
multilevel mesh partitioning," Parallel Computing, vol. 26, no. 12, pp.
1635-1660, Nov. 2000.
[5] J. Shi and J. Malik, "Normalized cuts and image segmentation," IEEE
Trans. Pattern Analysis & Machine Intelligence, vol. 22, no. 8, pp. 888-
905, Aug. 2000.
[6] G. Getz, H. Gal, I. Kela, D.A. Notterman, and E. Domany, "Coupled
two-way clustering analysis of breast cancer and colon cancer gene
expression data," Bioinformatics, vol. 19, no. 9, pp. 1079-1089, Jun.
2003.
[7] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman,
and A.Y. Wu, "An efficient k-means clustering algorithm: analysis and
implementation," IEEE Trans. Pattern Analysis & Machine Intelligence,
vol. 24, no. 7, pp. 881-892, Jul. 2002.
[8] C.H. Cheng, W.K. Lee, and K.F. Wong, "A genetic algorithm-based
clustering approach for database partitioning," IEEE Trans. Systems,
Man, & Cybernetics, vol. 32, no. 3, pp. 215-230, Aug. 2002.
[9] S. G├╝nter and H. Bunke, "Self-organizing map for clustering in the
graph domain," Pattern Recognition Letters, vol. 23, no. 4, pp. 405-417,
Feb. 2002.
[10] R.J. Hathaway and J.C. Bezdek, "Fuzzy c-means clustering of
incomplete data," IEEE Trans. Systems, Man, & Cybernetics, vol. 31, no.
5, pp. 735-744, Oct. 2001.
[11] C.Y. Chen and F. Ye, "Particle swarm optimization algorithm and its
application to clustering analysis," in Proc. 1st. Int. Conf. Networking,
Sensing, & Control, Taipei, Taiwan, 2004, pp. 789-794.
[12] A.K. Jain, M.N. Murty, and P.J. Flynn, "Data clustering: a review," ACM
Computing Surveys, vol. 31, no. 3, pp. 264-323, Sep. 1999.
[13] P. Berkhin, Survey of Clustering Data Mining Techniques, Accrue
Software Inc., San Jose, CA, 2002.
[14] S. Kotsiantis and P. Pintelas, "Recent advances in clustering: a brief
survey," WSEAS Trans. Information Science & Applications, vol. 1, no.
1, pp. 73-81, Jul. 2004.
[15] D. Pelleg and A. Moore, "X-means: extending k-means with efficient
estimation of the number of clusters," in Proc. 17th. Int. Conf. Machine
Learning, Stanford, CA, 2000, pp. 727-734.
[16] G. Hamerly and C. Elkan. (2003, Dec.). Learning the k in k-means. in
Proc. 17th. Conf. Neural Information Processing Systems. Vancouver,
Canada. Available: http://books.nips.cc/nips16.html.
[17] L.Y. Tseng and S.B. Yang, "A genetic approach to the automatic
clustering problem," Pattern Recognition, vol. 34, no. 2, pp. 415-424,
Feb. 2001.
[18] G. Garai and B.B. Chaudhuri, "A novel genetic algorithm for automatic
clustering," Pattern Recognition Letters, vol. 25, no. 2, pp. 173-187, Jan.
2004.
[19] R.J. Kuo, K. Chang, and S.Y. Chien, "Integration of self-organizing
feature maps and genetic-algorithm-based clustering method for market
segmentation," J. Organizational Computing & Electronic Commerce,
vol. 14, no. 1, pp. 43-60, Jan. 2004.
[20] C. Walshaw and M. Cross, "Mesh partitioning: a multilevel balancing
and refinement algorithm," SIAM J. Scientific Computing, vol. 22, no. 1,
pp. 63-80, Jan. 2000.
[21] S.J. D-Amico, S.J. Wang, R. Batta, and C.M. Rump, "A simulated
annealing approach to police district design," Computers & Operations
Research, vol. 29, no. 6, pp. 667-684, May 2002.
[22] Y.G. Saab, "An effective multilevel algorithm for bisecting graphs and
hypergraphs," IEEE Trans. Computers, vol. 53, no. 6, pp. 641-652, Jun.
2004.
[23] G. Wolfe, J.L. Wong, and M. Potkonjak, "Watermarking graph
partitioning solutions," IEEE Trans. Computer-Aided Design of
Integrated Circuits & Systems, vol. 21, no. 10, pp. 1196-1204, Oct.
2002.
[24] U. Elsner, "Graph partitioning: a survey," Technische Universitat
Chemnitz, Chemnitz, Germany, Tech. Rep. 393, Dec. 1997.
[25] P.O. Fj├ñllström. (1998, Sep.). Algorithms for graph partitioning: a
survey. Linköping Electronic Articles Computer & Information Science.
3(10). Available: http://www. ep.liu.se/ea/cis/1998/010.
[26] T.N. Bui and B.R. Moon, "Genetic algorithm and graph partitioning,"
IEEE Trans. Computers, vol. 45, no. 7, pp. 841-855, Jul. 1996.
[27] A. Kaveh and H.A.R. Bondarabady, "A hybrid graph-genetic method for
domain decomposition," Finite Elements in Analysis & Design, vol. 39,
no. 13, pp. 1237-1247, Oct. 2003.
[28] K. Kohmoto, K. Katayama, and H. Narihisa, "Performance of a genetic
algorithm for the graph partitioning problem," Mathematical &
Computer Modelling, vol. 38, no. 11-13, pp. 1325-1332, Dec. 2003.
[29] H. Stuckenschmidt and M. Klein, "Structure-based partitioning of large
concept hierarchies," in Proc. 3rd. Int. Conf. Semantic Web, Hiroshima,
Japan, 2004, pp. 289-303.
[30] M. Wall. (1996, Aug.). GAlib: a C++ library of genetic algorithm
components. Available: http://lancet.mit.edu/ga/dist/galibdoc.pdf.
[31] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, "A high-performance,
portable implementation of the MPI message-passing interface
standard," Parallel Computing, vol. 22, no. 6, pp. 789-828, Sep. 1996.