A Hybrid Ontology Based Approach for Ranking Documents
Increasing growth of information volume in the
internet causes an increasing need to develop new (semi)automatic
methods for retrieval of documents and ranking them according to
their relevance to the user query. In this paper, after a brief review
on ranking models, a new ontology based approach for ranking
HTML documents is proposed and evaluated in various
circumstances. Our approach is a combination of conceptual,
statistical and linguistic methods. This combination reserves the
precision of ranking without loosing the speed. Our approach
exploits natural language processing techniques to extract phrases
from documents and the query and doing stemming on words. Then
an ontology based conceptual method will be used to annotate
documents and expand the query. To expand a query the spread
activation algorithm is improved so that the expansion can be done
flexible and in various aspects. The annotated documents and the
expanded query will be processed to compute the relevance degree
exploiting statistical methods. The outstanding features of our
approach are (1) combining conceptual, statistical and linguistic
features of documents, (2) expanding the query with its related
concepts before comparing to documents, (3) extracting and using
both words and phrases to compute relevance degree, (4) improving
the spread activation algorithm to do the expansion based on
weighted combination of different conceptual relationships and (5)
allowing variable document vector dimensions. A ranking system
called ORank is developed to implement and test the proposed
model. The test results will be included at the end of the paper.
[1] E. Greengrass, "Information Retrieval: A survey". DOD Technical
Report TR-R52-008-001, November 2000.
[2] G. Salton, E. A.Fox, H. Wu, "Extended boolean information retrieval",
Communications of the ACM, Volume 26, No. 11, 1983, Pages: 1022-
1036.
[3] J.H. Lee, "Properties of extended boolean models in information
retrieval". Proceedings of the 17th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval,
1994, Pages: 182-190.
[4] D. L. Lee, H. Chuang, K. Seamons, "Document ranking and the Vector-
Space model". IEEE Software, Volume 14, Issue 2, March 1997, Pages:
67 - 75.
[5] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, R. Harshman,
"Indexing by latent semantic analysis", Journal of the American Society
for Information Science, Volume 41, Issue 6, 1990, Pages: 391-407.
[6] M. E. Maron, J. L. Kuhns, "On relevance, probabilistic indexing and
retrieval".Journal of the ACM, Volume 7, 1960, Pages: 216-244.
[7] F. Crestani, M. Lalmas, J. van Rijsbergen, L. Campbell, "Is this
document relevant? ...probably. A survey of probabilistic models in
information retrieval". ACM Computing Surveys, Volume 30, Issue 4,
December 1998, Pages: 528 - 552.
[8] M. R. Henzinger, "Hyperlink analysis for the web". IEEE Internet
Computing, Volume 5, Issue 1, January 2001, Pages: 45 - 50.
[9] S. Brin, L. Page, "The anatomy of a Large-Scale Hyper-textual web
search engine". Proceedings of the Seventh International World Wide
Web Conference, Elsevier Science, New York, 1998, Pages: 107-117.
[10] R. Baeza-Yates, E. Davis, "Web page ranking using link attributes".
International World Wide Web Conference, Proceedings of the 13th
International World Wide Web Conference on Alternate Track Papers &
Posters, New York, NY, USA, 2004, Pages: 328 - 329.
[11] R. Lempel, S. Moran. "The stochastic approach for link-structure
analysis (SALSA) and the TKC e®ect". In The Ninth International
WWW Conference, May 2000.
[12] D. Vallet, M. Fernández, P. Castells, "An Ontology-Based information
retrieval model". 2nd European Semantic Web Conference (ESWC
2005). Heraklion, Greece, May 2005. Springer Verlag Lecture Notes in
Computer Science, Volume 3532. G├│mez-
Pérez,A.;Euzenat,J.(Eds.),2005, Pages:455-470.
[13] M. Nakashima,Y. Kaneko, T. Ito, "Ranking of documents by measures
considering conceptual dependence between terms". Systems and
Computers in Japan, Volume 34, Issue 5 , 2003, Pages: 81 - 91.
[14] C. Rocha, D. Schwabe, M. Poggi de Aragão, "A hybrid approach for
searching in the semantic web". International World Wide Web
Conference, Proceedings of the 13th international conference on World
Wide Web, 2004, Pages: 374 - 383.
[15] Aeroswarm,http://ubot.lockheedmartin.com/ubot/hotdaml/aeroswarm.ht
ml
[16] LCNetTools,http://itlang/vb.net/archivio.asp?subMenu=Tutte&FullText
on&TypeRi=AND&keyword=LCNettools
[1] E. Greengrass, "Information Retrieval: A survey". DOD Technical
Report TR-R52-008-001, November 2000.
[2] G. Salton, E. A.Fox, H. Wu, "Extended boolean information retrieval",
Communications of the ACM, Volume 26, No. 11, 1983, Pages: 1022-
1036.
[3] J.H. Lee, "Properties of extended boolean models in information
retrieval". Proceedings of the 17th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval,
1994, Pages: 182-190.
[4] D. L. Lee, H. Chuang, K. Seamons, "Document ranking and the Vector-
Space model". IEEE Software, Volume 14, Issue 2, March 1997, Pages:
67 - 75.
[5] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, R. Harshman,
"Indexing by latent semantic analysis", Journal of the American Society
for Information Science, Volume 41, Issue 6, 1990, Pages: 391-407.
[6] M. E. Maron, J. L. Kuhns, "On relevance, probabilistic indexing and
retrieval".Journal of the ACM, Volume 7, 1960, Pages: 216-244.
[7] F. Crestani, M. Lalmas, J. van Rijsbergen, L. Campbell, "Is this
document relevant? ...probably. A survey of probabilistic models in
information retrieval". ACM Computing Surveys, Volume 30, Issue 4,
December 1998, Pages: 528 - 552.
[8] M. R. Henzinger, "Hyperlink analysis for the web". IEEE Internet
Computing, Volume 5, Issue 1, January 2001, Pages: 45 - 50.
[9] S. Brin, L. Page, "The anatomy of a Large-Scale Hyper-textual web
search engine". Proceedings of the Seventh International World Wide
Web Conference, Elsevier Science, New York, 1998, Pages: 107-117.
[10] R. Baeza-Yates, E. Davis, "Web page ranking using link attributes".
International World Wide Web Conference, Proceedings of the 13th
International World Wide Web Conference on Alternate Track Papers &
Posters, New York, NY, USA, 2004, Pages: 328 - 329.
[11] R. Lempel, S. Moran. "The stochastic approach for link-structure
analysis (SALSA) and the TKC e®ect". In The Ninth International
WWW Conference, May 2000.
[12] D. Vallet, M. Fernández, P. Castells, "An Ontology-Based information
retrieval model". 2nd European Semantic Web Conference (ESWC
2005). Heraklion, Greece, May 2005. Springer Verlag Lecture Notes in
Computer Science, Volume 3532. G├│mez-
Pérez,A.;Euzenat,J.(Eds.),2005, Pages:455-470.
[13] M. Nakashima,Y. Kaneko, T. Ito, "Ranking of documents by measures
considering conceptual dependence between terms". Systems and
Computers in Japan, Volume 34, Issue 5 , 2003, Pages: 81 - 91.
[14] C. Rocha, D. Schwabe, M. Poggi de Aragão, "A hybrid approach for
searching in the semantic web". International World Wide Web
Conference, Proceedings of the 13th international conference on World
Wide Web, 2004, Pages: 374 - 383.
[15] Aeroswarm,http://ubot.lockheedmartin.com/ubot/hotdaml/aeroswarm.ht
ml
[16] LCNetTools,http://itlang/vb.net/archivio.asp?subMenu=Tutte&FullText
on&TypeRi=AND&keyword=LCNettools
@article{"International Journal of Information, Control and Computer Sciences:53999", author = "Sarah Motiee and Azadeh Nematzadeh and Mehrnoush Shamsfard", title = "A Hybrid Ontology Based Approach for Ranking Documents", abstract = "Increasing growth of information volume in the
internet causes an increasing need to develop new (semi)automatic
methods for retrieval of documents and ranking them according to
their relevance to the user query. In this paper, after a brief review
on ranking models, a new ontology based approach for ranking
HTML documents is proposed and evaluated in various
circumstances. Our approach is a combination of conceptual,
statistical and linguistic methods. This combination reserves the
precision of ranking without loosing the speed. Our approach
exploits natural language processing techniques to extract phrases
from documents and the query and doing stemming on words. Then
an ontology based conceptual method will be used to annotate
documents and expand the query. To expand a query the spread
activation algorithm is improved so that the expansion can be done
flexible and in various aspects. The annotated documents and the
expanded query will be processed to compute the relevance degree
exploiting statistical methods. The outstanding features of our
approach are (1) combining conceptual, statistical and linguistic
features of documents, (2) expanding the query with its related
concepts before comparing to documents, (3) extracting and using
both words and phrases to compute relevance degree, (4) improving
the spread activation algorithm to do the expansion based on
weighted combination of different conceptual relationships and (5)
allowing variable document vector dimensions. A ranking system
called ORank is developed to implement and test the proposed
model. The test results will be included at the end of the paper.", keywords = "Document ranking, Ontology, Spread activation
algorithm, Annotation.", volume = "1", number = "11", pages = "3453-6", }