Organization Model of Semantic Document Repository and Search Techniques for Studying Information Technology

Nowadays, organizing a repository of documents and resources for learning on a special field as Information Technology (IT), together with search techniques based on domain knowledge or document-s content is an urgent need in practice of teaching, learning and researching. There have been several works related to methods of organization and search by content. However, the results are still limited and insufficient to meet user-s demand for semantic document retrieval. This paper presents a solution for the organization of a repository that supports semantic representation and processing in search. The proposed solution is a model which integrates components such as an ontology describing domain knowledge, a database of document repository, semantic representation for documents and a file system; with problems, semantic processing techniques and advanced search techniques based on measuring semantic similarity. The solution is applied to build a IT learning materials management system of a university with semantic search function serving students, teachers, and manager as well. The application has been implemented, tested at the University of Information Technology, Ho Chi Minh City, Vietnam and has achieved good results.




References:
[1] Aly, A.A, "Using a query expansion technique to improve document
retrieval", International Journal "Information Technologies and
Knowledge" (2008).
[2] Dario Bonino, Fulvio Corno, Laura Farinetti, Alessio Bosca , "Ontology
Driven Semantic Search", WSEAS Transaction on Information Science
and Application, Issue 6, Volume 1, pp. 1597-1605 (2004).
[3] D. Genest, M. Chein, "An experiment in Document Retrieval using
Conceptual Graph" , Proceeding of 5th ICCS Conference, Washington,
USA, p 489-504 (1997).
[4] Harter, S.P., "A probabilistic approach to automatic keyword indexing",
PhD thesis, Graduate Library, The University of Chicago, Thesis No.
T25146.
[5] Henrik Bulskov Styltsvig, "Ontology-based Information Retrieval", A
dissertation Presented to the Faculties of Roskilde University in Partial
Fulfillment of the Requirement for the Degree of Doctor of Philosophy
(2006).
[6] Henrik Eriksso, "The semantic-document approach to combining
documents and ontologies", International Journal of Human-Computer
Studies Volume 65, Issue 7, Pages 624-639 (2007)
[7] Kraaij, W., "Variations on Language Modeling for Information
Retrieval", ACM SIGIR Forum (2005).
[8] Sanderson M., "Word Sense Disambiguation and Information
Retrieval", Annual ACM Conference on Research and Development in
Information Retrieval, Ireland Springer-Verlag New York, Inc (1994)
[9] Salton G., A. Wong, and C.S. Yang, "A Vector Space Model for
Automatic Indexing", Communications of the ACM, 1975. 18(11): p.
613-620.
[10] Stokoe, C., M.P. Oakes, and J. Tait, "Word sense disambiguation in
information retrieval revisited", Annual ACM Conference on Research
and Development in Information Retrieval Toronto, Canada (2003).
[11] Thanh Tran, Philipp Cimiano, Sebastian Rudolph and Rudi Studer,
"Ontology-Based Interpretation of Keywords for Semantic Search", The
Semantic Web Lecture Notes in Computer Science, Volume 4825/2007,
523-536 (2007)
[12] Tzoukermann, E., J.L. Klavans, and C. Jacquemin, "Effective use of
natural language processing techniques for automatic conflation of
multi-word terms: the role of derivational morphology, part of speech
tagging, and shallow parsing", SIGIR -97: Proceedings of the 20th
annual international ACM SIGIR conference on Research and
development in information retrieval, p. 148-155 (1997)
[13] Vallez, M. and R. Pedraza-Jimenez, "Natural Language Processing in
Textual Information Retrieval and Related Topics", I.S.S.o.t.P.F.
University (2007).