Incorporating Semantic Similarity Measure in Genetic Algorithm : An Approach for Searching the Gene Ontology Terms

The most important property of the Gene Ontology is the terms. These control vocabularies are defined to provide consistent descriptions of gene products that are shareable and computationally accessible by humans, software agent, or other machine-readable meta-data. Each term is associated with information such as definition, synonyms, database references, amino acid sequences, and relationships to other terms. This information has made the Gene Ontology broadly applied in microarray and proteomic analysis. However, the process of searching the terms is still carried out using traditional approach which is based on keyword matching. The weaknesses of this approach are: ignoring semantic relationships between terms, and highly depending on a specialist to find similar terms. Therefore, this study combines semantic similarity measure and genetic algorithm to perform a better retrieval process for searching semantically similar terms. The semantic similarity measure is used to compute similitude strength between two terms. Then, the genetic algorithm is employed to perform batch retrievals and to handle the situation of the large search space of the Gene Ontology graph. The computational results are presented to show the effectiveness of the proposed algorithm.




References:
[1] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M.
Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris,
D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E.
Richardson, M. Ringwald, G.M. Rubin, and G.. Sherlock, "Gene
ontology: tool for the unification of biology," Nat. Genet., vol. 25, no. 1,
pp. 25-29, May 2000.
[2] H. Wu, Z. Su, F. Mao, V. Olman, and Y. Xu, "Prediction of functional
modules based on comparative genome analysis and gene ontology
application," Nucleic Acids Res., vol. 33, no. 9, pp. 2822-2837, May
2005.
[3] J.A. Young, Q.L. Fivelman, P.L. Blair, P. de la Vega, K.G. Le Roch, Y.
Zhou, D.J. Carucci, D.A. Baker, and E.A. Winzeler, "The plasmodium
falciparum sexual development transcriptome: a microarray analysis
using ontology-based pattern identification," Mol. Biochem. Parasitol.,
vol. 143, no. 1, pp. 67-79, Sep. 2005.
[4] J. Espadaler, O. Romero-Isart, R.M. Jackson, and B. Oliva, "Prediction
of protein-protein interactions using distant conservation of sequence
patterns and structure relationships," Bioinformatics, vol. 21, no. 16, pp.
3360-3368, Aug. 2005.
[5] S.M. Hauck, S. Schoeffmann, C.A. Deeg, C.J. Gloeckner, M.S. Lange,
and M. Ueffing, "Proteomic analysis of the porcine interphotoreceptor
matrix," Proteomics, vol. 5, no. 14, pp. 3623-3636, Sep. 2005.
[6] P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble, "Investigating
semantic similarity measures across the gene ontology: the relationship
between sequence and annotation," Bioinformatics, vol. 19, no. 10, pp.
1275-1283, Jul. 2003.
[7] K. Eilbeck, S.E. Lewis, C.J. Mungall, M. Yandell, L. Stein, R. Durbin,
and M. Ashburner, "The sequence ontology: a tool for the unification of
genome annotations," Genome Biol., vol. 6, no. 5, rec. R44, Apr. 2005.
[8] J. Bard, S.Y. Rhee, and M. Ashburner, "An ontology for cell types,"
Genome Biol., vol. 6, no. 2, rec. R21, Jan. 2005.
[9] H.J. Feldman, M. Dumontier, S. Ling, N. Haider, and C.W. Hogue, "CO:
a chemical ontology for identification of functional groups and semantic
comparison of small molecules," FEBS Lett., vol. 579, no. 21, pp. 4685-
4691, Aug. 2005.
[10] J.D. Thompson, S.R. Holbrook, K. Katoh, P. Koehl, D. Moras, E.
Westhof, and O. Poch, "MAO: a multiple alignment ontology for nucleic
acid and protein sequences," Nucleic Acids Res., vol. 33, no. 13, pp.
4164-4171, Jul. 2005.
[11] P. Grenon, B. Smith, and L. Goldberg, "Biodynamic ontology: applying
BFO in the biomedical domain," Stud. Health Technol. Inform., vol. 102,
pp. 20-38, Apr. 2004.
[12] E. Ratsch, J. Schultz, J. Saric, P.C. Lavin, U. Wittig, U. Reyle, and I.
Rojas, "Developing a protein-interactions ontology," Comp. Funct.
Genom., vol. 4, no. 1, pp. 85-89, Feb. 2003.
[13] H. Liu, Z. Hu, and C.H. Wu, "DynGO: a tool for browsing and mining
gene ontology and its associations," BMC Bioinformatics, vol. 6, rec.
201, Aug. 2005.
[14] F. Couto, M. Silva, and P. Coutinho, "Semantic similarity over the gene
ontology: family correlation and selecting disjunctive ancestors,"
presented at the 14th ACM Conf. Information and Knowledge
Management, Bremen, Germany, Oct. 31 - Nov. 5, 2005.
[15] M.A. Rodriguez and M.J. Egenhofer, "Determining semantic similarity
among entity classes from different ontologies," IEEE Trans. Knowledge
and Data Engineering, vol. 15, no. 2, pp. 442-456, Mar. 2003.
[16] C.-C. Feng and D.M. Flewelling, "Assessment of semantic similarity
between land use/land cover classification systems," Computers,
Environment, and Urban Systems, vol. 28, no. 3, pp. 229-246, May
2004.
[17] G. Vigliocco, D.P. Vinson, and S. Siri, "Semantic similarity and
grammatical class in naming actions," Cognition, vol. 94, no. 3, pp. B91-
B100, Jan. 2005.
[18] J.L. Sevilla, V. Segura, A. Podhorski, E. Guruceaga, J.M. Mato, L.A.
Martínez-Cruz, F.J. Corrales, and A. Rubio, "Correlation between gene
expression and GO semantic similarity," IEEE/ACM Trans.
Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 330-
338, Oct-Dec 2005.
[19] C. Leacock and M. Chodorow, "Combining local context and WordNet
similarity for word sense identification," in WordNet: An Electronic
Lexical Database, C. Fellbaum, Ed. Cambridge: MIT Press, 1998, pp.
265-283.
[20] D. Lin, "An information-theoretic definition of similarity," in Proc. 15th
Int. Conf. Machine Learning, Madison, WI, 1998, pp. 296-304.
[21] J.J. Jiang and D.W. Conrath, "Semantic similarity based on corpus
statistics and lexical taxonomy," in Proc. 1998 Int. Conf. Research in
Computational Linguistics, Taipei, Taiwan, 1998, pp. 19-33.
[22] P. Resnik, "Using information content to evaluate semantic similarity in
a taxonomy," in Proc. 14th Int. Joint Conf. Artificial Intelligence,
Montreal, Canada, 1995, pp. 448-453.
[23] A. Budanitsky and G. Hirst, "Semantic distance in WordNet: an
experimental, application-oriented evaluation of five measures,"
presented at the 2nd Meeting North American Chapter of the
Association for Computational Linguistics, Pittsburgh, PA, Jun. 2-7,
2001.
[24] L. Chen, C. Luh, and C. Jou, "Generating page clippings from web
search results using a dynamically terminated genetic algorithm,"
Information Systems, vol. 30, no. 4, pp. 299-316, Jun. 2005.
[25] M. Caramia, G. Felici, and A. Pezzoli, "Improving search results with
data mining in a thematic search engine," Computers & Operations
Research, vol. 31, no. 14, pp. 2387-2404, Dec. 2004.
[26] Z.Z. Nick and P. Themis, "Web search using a genetic algorithm,"
Internet Computing, vol. 5, no. 2, pp. 18-26, Mar. 2001.
[27] L. Tamine, C. Chrisment, and M. Boughanem, "Multiple query
evaluation based on an enhanced genetic algorithm," Information
Processing & Management, vol. 39, no. 2, pp. 215-231, Mar. 2003.
[28] J. Horng and C. Yeh, "Applying genetic algorithms to query
optimization in document retrieval," Information Processing &
Management, vol. 36, no. 5, pp. 737-759, Sep. 2000.
[29] I. Kushchu, "Web-based evolutionary and adaptive information
retrieval," IEEE Trans. Evolutionary Computation, vol. 9, no. 2, pp.
117-125, Apr. 2005.
[30] S.K. Pal, V. Talwar, and P. Mitra, "Web mining in soft computing
framework: relevance, state of the art and future directions," IEEE
Trans. Neural Networks, vol. 13, no. 5, pp. 1163-1177, Sep. 2002.
[31] H. Chen, "Machine learning for information retrieval: neural networks,
symbolic learning, and genetic algorithms," J. American Society for
Information Science, vol. 46, no. 3, pp. 194-216, Apr. 1995.
[32] R.M. Othman, S. Deris, R.M. Illias, Z. Zakaria, and S.M. Mohamad,
"Automatic clustering of gene ontology by genetic algorithm," Int. J.
Information Technology, vol. 3, no. 1, pp. 37-46, Apr. 2006.