Classifying Biomedical Text Abstracts based on Hierarchical 'Concept' Structure
Classifying biomedical literature is a difficult and
challenging task, especially when a large number of biomedical
articles should be organized into a hierarchical structure. In this paper,
we present an approach for classifying a collection of biomedical text
abstracts downloaded from Medline database with the help of
ontology alignment. To accomplish our goal, we construct two types
of hierarchies, the OHSUMED disease hierarchy and the Medline
abstract disease hierarchies from the OHSUMED dataset and the
Medline abstracts, respectively. Then, we enrich the OHSUMED
disease hierarchy before adapting it to ontology alignment process for
finding probable concepts or categories. Subsequently, we compute
the cosine similarity between the vector in probable concepts (in the
“enriched" OHSUMED disease hierarchy) and the vector in Medline
abstract disease hierarchies. Finally, we assign category to the new
Medline abstracts based on the similarity score. The results obtained
from the experiments show the performance of our proposed approach
for hierarchical classification is slightly better than the performance of
the multi-class flat classification.
[1] F. M. Couto, B. Martins and M. J. Silva, "Classifying biological articles
using web sources", In Proceedings of the 2004 ACM symposium on
Applied Computing, 2004, pp. 111-115.
[2] A. Singh and K. Nakata, "Hierarchical classification of web search results
using personalized ontologies", In Proceedings of the 3rd International
Conference on Universal Access in Human-Computer Interaction, HCI
International 2005, 2005.
[3] A. M. Cohen, "An effective general purpose approach for automated
biomedical document classification", AMIA 2006 Symposium
Proceeding, 2006, pp. 161-162.
[4] A. K. Pulijala and S. Gauch, "Hierarchical text classification", 2004,
URL: http://academic.research.microsoft.com/Paper/12788733.aspx.
[5] S. Gauch, A. Chandramouli and S. Ranganathan, "Training a hierarchical
classifier using inter-document relationships", Technical Report,
ITTC-FY2007-TR-31020-01, August 2006.
[6] M. E. Ruiz and P. Srinivasan, "Hierarchical neural networks for text
categorization", Information Retrieval, 5, 2002, pp. 87-118.
[7] T. Li, S. Zhu and M. Ogihara, "Hierarchical document classification using
automatically generated hierarchy", Journal of Intelligent Information
Systems, 29(2), 2007, pp. 211-230.
[8] G. Nenadic, S. Rice, I. Spasic, S. Ananiadou and B. Stapley, "Selecting
text features for gene name classification: from documents to terms", In
Proceedings of the ACL 2003 workshop on Natural language processing
in biomedicine, Vol. 13, 2003, pp. 121-128.
[9] Y. Wang and Z. Gong, "Hierarchical classification of web pages using
support vector machine", In Proceedings of 11th International
Conference on Asian Digital Libraries, ICADL 2008, Bali, Indonesia.
Proceedings, Lecture Notes in Computer Science 5362, Springer, 2008,
pp. 12-21.
[10] S. Dumais and H. Chen, "Hierarchical classification of web content", In
Proceeding of the SIGIR2000, Athens, GR, 2000, pp. 256-263.
[11] OHSUMED dataset, URL: http://davis.wpi.edu/xmdv/datasets/
ohsumed.html.
[12] Medical Subject Heading (MeSH) tree structures, URL:
http://www.nlm.nih.gov/mesh/trees.html.
[13] M.H. Seddiqui and M. Aono, "An efficient and scalable algorithm for
segmented alignment of ontologies of arbitrary size", Web Semantics:
Science, Services and Agents on the World Wide Web, (7), 2009, pp.
344-356.
[14] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector
machines", 2007, URL: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[1] F. M. Couto, B. Martins and M. J. Silva, "Classifying biological articles
using web sources", In Proceedings of the 2004 ACM symposium on
Applied Computing, 2004, pp. 111-115.
[2] A. Singh and K. Nakata, "Hierarchical classification of web search results
using personalized ontologies", In Proceedings of the 3rd International
Conference on Universal Access in Human-Computer Interaction, HCI
International 2005, 2005.
[3] A. M. Cohen, "An effective general purpose approach for automated
biomedical document classification", AMIA 2006 Symposium
Proceeding, 2006, pp. 161-162.
[4] A. K. Pulijala and S. Gauch, "Hierarchical text classification", 2004,
URL: http://academic.research.microsoft.com/Paper/12788733.aspx.
[5] S. Gauch, A. Chandramouli and S. Ranganathan, "Training a hierarchical
classifier using inter-document relationships", Technical Report,
ITTC-FY2007-TR-31020-01, August 2006.
[6] M. E. Ruiz and P. Srinivasan, "Hierarchical neural networks for text
categorization", Information Retrieval, 5, 2002, pp. 87-118.
[7] T. Li, S. Zhu and M. Ogihara, "Hierarchical document classification using
automatically generated hierarchy", Journal of Intelligent Information
Systems, 29(2), 2007, pp. 211-230.
[8] G. Nenadic, S. Rice, I. Spasic, S. Ananiadou and B. Stapley, "Selecting
text features for gene name classification: from documents to terms", In
Proceedings of the ACL 2003 workshop on Natural language processing
in biomedicine, Vol. 13, 2003, pp. 121-128.
[9] Y. Wang and Z. Gong, "Hierarchical classification of web pages using
support vector machine", In Proceedings of 11th International
Conference on Asian Digital Libraries, ICADL 2008, Bali, Indonesia.
Proceedings, Lecture Notes in Computer Science 5362, Springer, 2008,
pp. 12-21.
[10] S. Dumais and H. Chen, "Hierarchical classification of web content", In
Proceeding of the SIGIR2000, Athens, GR, 2000, pp. 256-263.
[11] OHSUMED dataset, URL: http://davis.wpi.edu/xmdv/datasets/
ohsumed.html.
[12] Medical Subject Heading (MeSH) tree structures, URL:
http://www.nlm.nih.gov/mesh/trees.html.
[13] M.H. Seddiqui and M. Aono, "An efficient and scalable algorithm for
segmented alignment of ontologies of arbitrary size", Web Semantics:
Science, Services and Agents on the World Wide Web, (7), 2009, pp.
344-356.
[14] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector
machines", 2007, URL: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
@article{"International Journal of Information, Control and Computer Sciences:62388", author = "Rozilawati Binti Dollah and Masaki Aono", title = "Classifying Biomedical Text Abstracts based on Hierarchical 'Concept' Structure", abstract = "Classifying biomedical literature is a difficult and
challenging task, especially when a large number of biomedical
articles should be organized into a hierarchical structure. In this paper,
we present an approach for classifying a collection of biomedical text
abstracts downloaded from Medline database with the help of
ontology alignment. To accomplish our goal, we construct two types
of hierarchies, the OHSUMED disease hierarchy and the Medline
abstract disease hierarchies from the OHSUMED dataset and the
Medline abstracts, respectively. Then, we enrich the OHSUMED
disease hierarchy before adapting it to ontology alignment process for
finding probable concepts or categories. Subsequently, we compute
the cosine similarity between the vector in probable concepts (in the
“enriched" OHSUMED disease hierarchy) and the vector in Medline
abstract disease hierarchies. Finally, we assign category to the new
Medline abstracts based on the similarity score. The results obtained
from the experiments show the performance of our proposed approach
for hierarchical classification is slightly better than the performance of
the multi-class flat classification.", keywords = "Biomedical literature, hierarchical text classification,ontology alignment, text mining.", volume = "5", number = "2", pages = "200-6", }