Abstract: The most important property of the Gene Ontology is
the terms. These control vocabularies are defined to provide
consistent descriptions of gene products that are shareable and
computationally accessible by humans, software agent, or other
machine-readable meta-data. Each term is associated with
information such as definition, synonyms, database references, amino
acid sequences, and relationships to other terms. This information has
made the Gene Ontology broadly applied in microarray and
proteomic analysis. However, the process of searching the terms is
still carried out using traditional approach which is based on keyword
matching. The weaknesses of this approach are: ignoring semantic
relationships between terms, and highly depending on a specialist to
find similar terms. Therefore, this study combines semantic similarity
measure and genetic algorithm to perform a better retrieval process
for searching semantically similar terms. The semantic similarity
measure is used to compute similitude strength between two terms.
Then, the genetic algorithm is employed to perform batch retrievals
and to handle the situation of the large search space of the Gene
Ontology graph. The computational results are presented to show the
effectiveness of the proposed algorithm.
Abstract: This paper is concerned with the production of an Arabic word semantic similarity benchmark dataset. It is the first of its kind for Arabic which was particularly developed to assess the accuracy of word semantic similarity measurements. Semantic similarity is an essential component to numerous applications in fields such as natural language processing, artificial intelligence, linguistics, and psychology. Most of the reported work has been done for English. To the best of our knowledge, there is no word similarity measure developed specifically for Arabic. In this paper, an Arabic benchmark dataset of 70 word pairs is presented. New methods and best possible available techniques have been used in this study to produce the Arabic dataset. This includes selecting and creating materials, collecting human ratings from a representative sample of participants, and calculating the overall ratings. This dataset will make a substantial contribution to future work in the field of Arabic WSS and hopefully it will be considered as a reference basis from which to evaluate and compare different methodologies in the field.