Fuzzy C-Means Clustering for Biomedical Documents Using Ontology Based Indexing and Semantic Annotation

Search is the most obvious application of information
retrieval. The variety of widely obtainable biomedical data is
enormous and is expanding fast. This expansion makes the existing
techniques are not enough to extract the most interesting patterns
from the collection as per the user requirement. Recent researches are
concentrating more on semantic based searching than the traditional
term based searches. Algorithms for semantic searches are
implemented based on the relations exist between the words of the
documents. Ontologies are used as domain knowledge for identifying
the semantic relations as well as to structure the data for effective
information retrieval. Annotation of data with concepts of ontology is
one of the wide-ranging practices for clustering the documents. In
this paper, indexing based on concept and annotation are proposed
for clustering the biomedical documents. Fuzzy c-means (FCM)
clustering algorithm is used to cluster the documents. The
performances of the proposed methods are analyzed with traditional
term based clustering for PubMed articles in five different diseases
communities. The experimental results show that the proposed
methods outperform the term based fuzzy clustering.





References:
[1] Jonquet, Clement, Mark A. Musen, and Nigam Shah,"A system for
ontology-based annotation of biomedical data," Data Integration in the
Life Sciences, Springer Berlin Heidelberg, pp. 144-152, 2008.
[2] Adrien Coulet, Florent Domenach, Mehdi Kaytoue and Amedeo Napoli,
"Using pattern structures for analyzing ontology-based annotations of
biomedical data,” in Proc. Formal Concept Analysis,. Springer Berlin
Heidelberg, pp. 76-91, 2013.
[3] Fontes, Celso Araujo, Maria Claudia Cavalcanti, and Ana Maria de C.
Moura. "An Ontology-Based Reasoning Approach for Document
Annotation," in Proc. IEEE Seventh International Conference on
Semantic Computing (ICSC), pp. 160-167, 2013.
[4] Tsatsaronis, George, Natalia Macari, Sunna Torge, Heiko Dietze, and
Michael Schroeder, "A maximum-entropy approach for accurate
document annotation in the biomedical domain," J. Biomedical
semantics, vol. 3, no. 1, pp.1-17, 2012.
[5] Hazman, Maryam, Samhaa R. El-Beltagy, and Ahmed Rafea, "An
Ontology Based Approach for Automatically Annotating Document
Segments," Int. J. Computer Science Issues (IJCSI), vol. 9, no. 2,
pp.221-230, 2012.
[6] Kiryakov, Atanas, Borislav Popov, Ivan Terziev, Dimitar Manov, and
Damyan Ognyanoff, "Semantic annotation, indexing, and retrieval."
Web Semantics: Science, Services and Agents on the World Wide Web,
vol. 2, no. 1, pp. 49-79, 2004.
[7] Cheung, Warren A., BF F. Ouellette, and Wyeth W. Wasserman,
"Quantitative biomedical annotation using medical subject heading overrepresentation
profiles (MeSHOPs)," BMC bioinformatics, vol.13, no.
249, pp.1-11, 2012.
[8] Chua, Watson Wei Khong, and Jung-jae Kim, "Semantic querying over
knowledge in biomedical text corpora annotated with multiple
ontologies," in Proc. of the ACM Conference on Bioinformatics,
Computational Biology and Biomedicine, pp. 400-407, 2012.
[9] W. Shuguang and H. Milos , ‘Keyword annotation of biomedical
documents with graph-based similarity methods’, in Proc. of IEEE
international conferences on bioinformatics and biomedicine, pp. 361-
364, 2012.
[10] Chattopadhyay, Subhagata, Dilip Kumar Pratihar, and Sanjib Chandra
De Sarkar, "A Comparative Study of Fuzzy C-Means Algorithm and
Entropy-Based Fuzzy Clustering Algorithms," Computing &
Informatics, vol. 30, no. 4, pp. 701-720, 2011.
[11] ] Kang, Jiayin, and Wenjun Zhang, "Combination of Fuzzy C-means and
Harmony Search Algorithms for Clustering of Text Document," Journal
of Computational Information Systems, vol. 7, no. 16,pp. 5980-5986,
2011.
[12] Sridevi, U. K., and N. Nagaveni, "An ontology based model for
document clustering," Int. J. Intelligent Information Technologies
(IJIIT), vol. 7, no.3, pp. 54-69, 2011.