Classifying Biomedical Text Abstracts based on Hierarchical 'Concept' Structure

Classifying biomedical literature is a difficult and challenging task, especially when a large number of biomedical articles should be organized into a hierarchical structure. In this paper, we present an approach for classifying a collection of biomedical text abstracts downloaded from Medline database with the help of ontology alignment. To accomplish our goal, we construct two types of hierarchies, the OHSUMED disease hierarchy and the Medline abstract disease hierarchies from the OHSUMED dataset and the Medline abstracts, respectively. Then, we enrich the OHSUMED disease hierarchy before adapting it to ontology alignment process for finding probable concepts or categories. Subsequently, we compute the cosine similarity between the vector in probable concepts (in the “enriched" OHSUMED disease hierarchy) and the vector in Medline abstract disease hierarchies. Finally, we assign category to the new Medline abstracts based on the similarity score. The results obtained from the experiments show the performance of our proposed approach for hierarchical classification is slightly better than the performance of the multi-class flat classification.




References:
[1] F. M. Couto, B. Martins and M. J. Silva, "Classifying biological articles
using web sources", In Proceedings of the 2004 ACM symposium on
Applied Computing, 2004, pp. 111-115.
[2] A. Singh and K. Nakata, "Hierarchical classification of web search results
using personalized ontologies", In Proceedings of the 3rd International
Conference on Universal Access in Human-Computer Interaction, HCI
International 2005, 2005.
[3] A. M. Cohen, "An effective general purpose approach for automated
biomedical document classification", AMIA 2006 Symposium
Proceeding, 2006, pp. 161-162.
[4] A. K. Pulijala and S. Gauch, "Hierarchical text classification", 2004,
URL: http://academic.research.microsoft.com/Paper/12788733.aspx.
[5] S. Gauch, A. Chandramouli and S. Ranganathan, "Training a hierarchical
classifier using inter-document relationships", Technical Report,
ITTC-FY2007-TR-31020-01, August 2006.
[6] M. E. Ruiz and P. Srinivasan, "Hierarchical neural networks for text
categorization", Information Retrieval, 5, 2002, pp. 87-118.
[7] T. Li, S. Zhu and M. Ogihara, "Hierarchical document classification using
automatically generated hierarchy", Journal of Intelligent Information
Systems, 29(2), 2007, pp. 211-230.
[8] G. Nenadic, S. Rice, I. Spasic, S. Ananiadou and B. Stapley, "Selecting
text features for gene name classification: from documents to terms", In
Proceedings of the ACL 2003 workshop on Natural language processing
in biomedicine, Vol. 13, 2003, pp. 121-128.
[9] Y. Wang and Z. Gong, "Hierarchical classification of web pages using
support vector machine", In Proceedings of 11th International
Conference on Asian Digital Libraries, ICADL 2008, Bali, Indonesia.
Proceedings, Lecture Notes in Computer Science 5362, Springer, 2008,
pp. 12-21.
[10] S. Dumais and H. Chen, "Hierarchical classification of web content", In
Proceeding of the SIGIR2000, Athens, GR, 2000, pp. 256-263.
[11] OHSUMED dataset, URL: http://davis.wpi.edu/xmdv/datasets/
ohsumed.html.
[12] Medical Subject Heading (MeSH) tree structures, URL:
http://www.nlm.nih.gov/mesh/trees.html.
[13] M.H. Seddiqui and M. Aono, "An efficient and scalable algorithm for
segmented alignment of ontologies of arbitrary size", Web Semantics:
Science, Services and Agents on the World Wide Web, (7), 2009, pp.
344-356.
[14] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector
machines", 2007, URL: http://www.csie.ntu.edu.tw/~cjlin/libsvm.