Incorporating Semantic Similarity Measure in Genetic Algorithm : An Approach for Searching the Gene Ontology Terms
The most important property of the Gene Ontology is
the terms. These control vocabularies are defined to provide
consistent descriptions of gene products that are shareable and
computationally accessible by humans, software agent, or other
machine-readable meta-data. Each term is associated with
information such as definition, synonyms, database references, amino
acid sequences, and relationships to other terms. This information has
made the Gene Ontology broadly applied in microarray and
proteomic analysis. However, the process of searching the terms is
still carried out using traditional approach which is based on keyword
matching. The weaknesses of this approach are: ignoring semantic
relationships between terms, and highly depending on a specialist to
find similar terms. Therefore, this study combines semantic similarity
measure and genetic algorithm to perform a better retrieval process
for searching semantically similar terms. The semantic similarity
measure is used to compute similitude strength between two terms.
Then, the genetic algorithm is employed to perform batch retrievals
and to handle the situation of the large search space of the Gene
Ontology graph. The computational results are presented to show the
effectiveness of the proposed algorithm.
[1] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M.
Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris,
D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E.
Richardson, M. Ringwald, G.M. Rubin, and G.. Sherlock, "Gene
ontology: tool for the unification of biology," Nat. Genet., vol. 25, no. 1,
pp. 25-29, May 2000.
[2] H. Wu, Z. Su, F. Mao, V. Olman, and Y. Xu, "Prediction of functional
modules based on comparative genome analysis and gene ontology
application," Nucleic Acids Res., vol. 33, no. 9, pp. 2822-2837, May
2005.
[3] J.A. Young, Q.L. Fivelman, P.L. Blair, P. de la Vega, K.G. Le Roch, Y.
Zhou, D.J. Carucci, D.A. Baker, and E.A. Winzeler, "The plasmodium
falciparum sexual development transcriptome: a microarray analysis
using ontology-based pattern identification," Mol. Biochem. Parasitol.,
vol. 143, no. 1, pp. 67-79, Sep. 2005.
[4] J. Espadaler, O. Romero-Isart, R.M. Jackson, and B. Oliva, "Prediction
of protein-protein interactions using distant conservation of sequence
patterns and structure relationships," Bioinformatics, vol. 21, no. 16, pp.
3360-3368, Aug. 2005.
[5] S.M. Hauck, S. Schoeffmann, C.A. Deeg, C.J. Gloeckner, M.S. Lange,
and M. Ueffing, "Proteomic analysis of the porcine interphotoreceptor
matrix," Proteomics, vol. 5, no. 14, pp. 3623-3636, Sep. 2005.
[6] P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble, "Investigating
semantic similarity measures across the gene ontology: the relationship
between sequence and annotation," Bioinformatics, vol. 19, no. 10, pp.
1275-1283, Jul. 2003.
[7] K. Eilbeck, S.E. Lewis, C.J. Mungall, M. Yandell, L. Stein, R. Durbin,
and M. Ashburner, "The sequence ontology: a tool for the unification of
genome annotations," Genome Biol., vol. 6, no. 5, rec. R44, Apr. 2005.
[8] J. Bard, S.Y. Rhee, and M. Ashburner, "An ontology for cell types,"
Genome Biol., vol. 6, no. 2, rec. R21, Jan. 2005.
[9] H.J. Feldman, M. Dumontier, S. Ling, N. Haider, and C.W. Hogue, "CO:
a chemical ontology for identification of functional groups and semantic
comparison of small molecules," FEBS Lett., vol. 579, no. 21, pp. 4685-
4691, Aug. 2005.
[10] J.D. Thompson, S.R. Holbrook, K. Katoh, P. Koehl, D. Moras, E.
Westhof, and O. Poch, "MAO: a multiple alignment ontology for nucleic
acid and protein sequences," Nucleic Acids Res., vol. 33, no. 13, pp.
4164-4171, Jul. 2005.
[11] P. Grenon, B. Smith, and L. Goldberg, "Biodynamic ontology: applying
BFO in the biomedical domain," Stud. Health Technol. Inform., vol. 102,
pp. 20-38, Apr. 2004.
[12] E. Ratsch, J. Schultz, J. Saric, P.C. Lavin, U. Wittig, U. Reyle, and I.
Rojas, "Developing a protein-interactions ontology," Comp. Funct.
Genom., vol. 4, no. 1, pp. 85-89, Feb. 2003.
[13] H. Liu, Z. Hu, and C.H. Wu, "DynGO: a tool for browsing and mining
gene ontology and its associations," BMC Bioinformatics, vol. 6, rec.
201, Aug. 2005.
[14] F. Couto, M. Silva, and P. Coutinho, "Semantic similarity over the gene
ontology: family correlation and selecting disjunctive ancestors,"
presented at the 14th ACM Conf. Information and Knowledge
Management, Bremen, Germany, Oct. 31 - Nov. 5, 2005.
[15] M.A. Rodriguez and M.J. Egenhofer, "Determining semantic similarity
among entity classes from different ontologies," IEEE Trans. Knowledge
and Data Engineering, vol. 15, no. 2, pp. 442-456, Mar. 2003.
[16] C.-C. Feng and D.M. Flewelling, "Assessment of semantic similarity
between land use/land cover classification systems," Computers,
Environment, and Urban Systems, vol. 28, no. 3, pp. 229-246, May
2004.
[17] G. Vigliocco, D.P. Vinson, and S. Siri, "Semantic similarity and
grammatical class in naming actions," Cognition, vol. 94, no. 3, pp. B91-
B100, Jan. 2005.
[18] J.L. Sevilla, V. Segura, A. Podhorski, E. Guruceaga, J.M. Mato, L.A.
Martínez-Cruz, F.J. Corrales, and A. Rubio, "Correlation between gene
expression and GO semantic similarity," IEEE/ACM Trans.
Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 330-
338, Oct-Dec 2005.
[19] C. Leacock and M. Chodorow, "Combining local context and WordNet
similarity for word sense identification," in WordNet: An Electronic
Lexical Database, C. Fellbaum, Ed. Cambridge: MIT Press, 1998, pp.
265-283.
[20] D. Lin, "An information-theoretic definition of similarity," in Proc. 15th
Int. Conf. Machine Learning, Madison, WI, 1998, pp. 296-304.
[21] J.J. Jiang and D.W. Conrath, "Semantic similarity based on corpus
statistics and lexical taxonomy," in Proc. 1998 Int. Conf. Research in
Computational Linguistics, Taipei, Taiwan, 1998, pp. 19-33.
[22] P. Resnik, "Using information content to evaluate semantic similarity in
a taxonomy," in Proc. 14th Int. Joint Conf. Artificial Intelligence,
Montreal, Canada, 1995, pp. 448-453.
[23] A. Budanitsky and G. Hirst, "Semantic distance in WordNet: an
experimental, application-oriented evaluation of five measures,"
presented at the 2nd Meeting North American Chapter of the
Association for Computational Linguistics, Pittsburgh, PA, Jun. 2-7,
2001.
[24] L. Chen, C. Luh, and C. Jou, "Generating page clippings from web
search results using a dynamically terminated genetic algorithm,"
Information Systems, vol. 30, no. 4, pp. 299-316, Jun. 2005.
[25] M. Caramia, G. Felici, and A. Pezzoli, "Improving search results with
data mining in a thematic search engine," Computers & Operations
Research, vol. 31, no. 14, pp. 2387-2404, Dec. 2004.
[26] Z.Z. Nick and P. Themis, "Web search using a genetic algorithm,"
Internet Computing, vol. 5, no. 2, pp. 18-26, Mar. 2001.
[27] L. Tamine, C. Chrisment, and M. Boughanem, "Multiple query
evaluation based on an enhanced genetic algorithm," Information
Processing & Management, vol. 39, no. 2, pp. 215-231, Mar. 2003.
[28] J. Horng and C. Yeh, "Applying genetic algorithms to query
optimization in document retrieval," Information Processing &
Management, vol. 36, no. 5, pp. 737-759, Sep. 2000.
[29] I. Kushchu, "Web-based evolutionary and adaptive information
retrieval," IEEE Trans. Evolutionary Computation, vol. 9, no. 2, pp.
117-125, Apr. 2005.
[30] S.K. Pal, V. Talwar, and P. Mitra, "Web mining in soft computing
framework: relevance, state of the art and future directions," IEEE
Trans. Neural Networks, vol. 13, no. 5, pp. 1163-1177, Sep. 2002.
[31] H. Chen, "Machine learning for information retrieval: neural networks,
symbolic learning, and genetic algorithms," J. American Society for
Information Science, vol. 46, no. 3, pp. 194-216, Apr. 1995.
[32] R.M. Othman, S. Deris, R.M. Illias, Z. Zakaria, and S.M. Mohamad,
"Automatic clustering of gene ontology by genetic algorithm," Int. J.
Information Technology, vol. 3, no. 1, pp. 37-46, Apr. 2006.
[1] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M.
Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris,
D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E.
Richardson, M. Ringwald, G.M. Rubin, and G.. Sherlock, "Gene
ontology: tool for the unification of biology," Nat. Genet., vol. 25, no. 1,
pp. 25-29, May 2000.
[2] H. Wu, Z. Su, F. Mao, V. Olman, and Y. Xu, "Prediction of functional
modules based on comparative genome analysis and gene ontology
application," Nucleic Acids Res., vol. 33, no. 9, pp. 2822-2837, May
2005.
[3] J.A. Young, Q.L. Fivelman, P.L. Blair, P. de la Vega, K.G. Le Roch, Y.
Zhou, D.J. Carucci, D.A. Baker, and E.A. Winzeler, "The plasmodium
falciparum sexual development transcriptome: a microarray analysis
using ontology-based pattern identification," Mol. Biochem. Parasitol.,
vol. 143, no. 1, pp. 67-79, Sep. 2005.
[4] J. Espadaler, O. Romero-Isart, R.M. Jackson, and B. Oliva, "Prediction
of protein-protein interactions using distant conservation of sequence
patterns and structure relationships," Bioinformatics, vol. 21, no. 16, pp.
3360-3368, Aug. 2005.
[5] S.M. Hauck, S. Schoeffmann, C.A. Deeg, C.J. Gloeckner, M.S. Lange,
and M. Ueffing, "Proteomic analysis of the porcine interphotoreceptor
matrix," Proteomics, vol. 5, no. 14, pp. 3623-3636, Sep. 2005.
[6] P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble, "Investigating
semantic similarity measures across the gene ontology: the relationship
between sequence and annotation," Bioinformatics, vol. 19, no. 10, pp.
1275-1283, Jul. 2003.
[7] K. Eilbeck, S.E. Lewis, C.J. Mungall, M. Yandell, L. Stein, R. Durbin,
and M. Ashburner, "The sequence ontology: a tool for the unification of
genome annotations," Genome Biol., vol. 6, no. 5, rec. R44, Apr. 2005.
[8] J. Bard, S.Y. Rhee, and M. Ashburner, "An ontology for cell types,"
Genome Biol., vol. 6, no. 2, rec. R21, Jan. 2005.
[9] H.J. Feldman, M. Dumontier, S. Ling, N. Haider, and C.W. Hogue, "CO:
a chemical ontology for identification of functional groups and semantic
comparison of small molecules," FEBS Lett., vol. 579, no. 21, pp. 4685-
4691, Aug. 2005.
[10] J.D. Thompson, S.R. Holbrook, K. Katoh, P. Koehl, D. Moras, E.
Westhof, and O. Poch, "MAO: a multiple alignment ontology for nucleic
acid and protein sequences," Nucleic Acids Res., vol. 33, no. 13, pp.
4164-4171, Jul. 2005.
[11] P. Grenon, B. Smith, and L. Goldberg, "Biodynamic ontology: applying
BFO in the biomedical domain," Stud. Health Technol. Inform., vol. 102,
pp. 20-38, Apr. 2004.
[12] E. Ratsch, J. Schultz, J. Saric, P.C. Lavin, U. Wittig, U. Reyle, and I.
Rojas, "Developing a protein-interactions ontology," Comp. Funct.
Genom., vol. 4, no. 1, pp. 85-89, Feb. 2003.
[13] H. Liu, Z. Hu, and C.H. Wu, "DynGO: a tool for browsing and mining
gene ontology and its associations," BMC Bioinformatics, vol. 6, rec.
201, Aug. 2005.
[14] F. Couto, M. Silva, and P. Coutinho, "Semantic similarity over the gene
ontology: family correlation and selecting disjunctive ancestors,"
presented at the 14th ACM Conf. Information and Knowledge
Management, Bremen, Germany, Oct. 31 - Nov. 5, 2005.
[15] M.A. Rodriguez and M.J. Egenhofer, "Determining semantic similarity
among entity classes from different ontologies," IEEE Trans. Knowledge
and Data Engineering, vol. 15, no. 2, pp. 442-456, Mar. 2003.
[16] C.-C. Feng and D.M. Flewelling, "Assessment of semantic similarity
between land use/land cover classification systems," Computers,
Environment, and Urban Systems, vol. 28, no. 3, pp. 229-246, May
2004.
[17] G. Vigliocco, D.P. Vinson, and S. Siri, "Semantic similarity and
grammatical class in naming actions," Cognition, vol. 94, no. 3, pp. B91-
B100, Jan. 2005.
[18] J.L. Sevilla, V. Segura, A. Podhorski, E. Guruceaga, J.M. Mato, L.A.
Martínez-Cruz, F.J. Corrales, and A. Rubio, "Correlation between gene
expression and GO semantic similarity," IEEE/ACM Trans.
Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 330-
338, Oct-Dec 2005.
[19] C. Leacock and M. Chodorow, "Combining local context and WordNet
similarity for word sense identification," in WordNet: An Electronic
Lexical Database, C. Fellbaum, Ed. Cambridge: MIT Press, 1998, pp.
265-283.
[20] D. Lin, "An information-theoretic definition of similarity," in Proc. 15th
Int. Conf. Machine Learning, Madison, WI, 1998, pp. 296-304.
[21] J.J. Jiang and D.W. Conrath, "Semantic similarity based on corpus
statistics and lexical taxonomy," in Proc. 1998 Int. Conf. Research in
Computational Linguistics, Taipei, Taiwan, 1998, pp. 19-33.
[22] P. Resnik, "Using information content to evaluate semantic similarity in
a taxonomy," in Proc. 14th Int. Joint Conf. Artificial Intelligence,
Montreal, Canada, 1995, pp. 448-453.
[23] A. Budanitsky and G. Hirst, "Semantic distance in WordNet: an
experimental, application-oriented evaluation of five measures,"
presented at the 2nd Meeting North American Chapter of the
Association for Computational Linguistics, Pittsburgh, PA, Jun. 2-7,
2001.
[24] L. Chen, C. Luh, and C. Jou, "Generating page clippings from web
search results using a dynamically terminated genetic algorithm,"
Information Systems, vol. 30, no. 4, pp. 299-316, Jun. 2005.
[25] M. Caramia, G. Felici, and A. Pezzoli, "Improving search results with
data mining in a thematic search engine," Computers & Operations
Research, vol. 31, no. 14, pp. 2387-2404, Dec. 2004.
[26] Z.Z. Nick and P. Themis, "Web search using a genetic algorithm,"
Internet Computing, vol. 5, no. 2, pp. 18-26, Mar. 2001.
[27] L. Tamine, C. Chrisment, and M. Boughanem, "Multiple query
evaluation based on an enhanced genetic algorithm," Information
Processing & Management, vol. 39, no. 2, pp. 215-231, Mar. 2003.
[28] J. Horng and C. Yeh, "Applying genetic algorithms to query
optimization in document retrieval," Information Processing &
Management, vol. 36, no. 5, pp. 737-759, Sep. 2000.
[29] I. Kushchu, "Web-based evolutionary and adaptive information
retrieval," IEEE Trans. Evolutionary Computation, vol. 9, no. 2, pp.
117-125, Apr. 2005.
[30] S.K. Pal, V. Talwar, and P. Mitra, "Web mining in soft computing
framework: relevance, state of the art and future directions," IEEE
Trans. Neural Networks, vol. 13, no. 5, pp. 1163-1177, Sep. 2002.
[31] H. Chen, "Machine learning for information retrieval: neural networks,
symbolic learning, and genetic algorithms," J. American Society for
Information Science, vol. 46, no. 3, pp. 194-216, Apr. 1995.
[32] R.M. Othman, S. Deris, R.M. Illias, Z. Zakaria, and S.M. Mohamad,
"Automatic clustering of gene ontology by genetic algorithm," Int. J.
Information Technology, vol. 3, no. 1, pp. 37-46, Apr. 2006.
@article{"International Journal of Information, Control and Computer Sciences:64483", author = "Razib M. Othman and Safaai Deris and Rosli M. Illias and Hany T. Alashwal and Rohayanti Hassan and FarhanMohamed", title = "Incorporating Semantic Similarity Measure in Genetic Algorithm : An Approach for Searching the Gene Ontology Terms", abstract = "The most important property of the Gene Ontology is
the terms. These control vocabularies are defined to provide
consistent descriptions of gene products that are shareable and
computationally accessible by humans, software agent, or other
machine-readable meta-data. Each term is associated with
information such as definition, synonyms, database references, amino
acid sequences, and relationships to other terms. This information has
made the Gene Ontology broadly applied in microarray and
proteomic analysis. However, the process of searching the terms is
still carried out using traditional approach which is based on keyword
matching. The weaknesses of this approach are: ignoring semantic
relationships between terms, and highly depending on a specialist to
find similar terms. Therefore, this study combines semantic similarity
measure and genetic algorithm to perform a better retrieval process
for searching semantically similar terms. The semantic similarity
measure is used to compute similitude strength between two terms.
Then, the genetic algorithm is employed to perform batch retrievals
and to handle the situation of the large search space of the Gene
Ontology graph. The computational results are presented to show the
effectiveness of the proposed algorithm.", keywords = "Gene Ontology, Semantic similarity measure,Genetic algorithm, Ontology search", volume = "1", number = "12", pages = "4097-10", }