A Comparative Study of Page Ranking Algorithms for Information Retrieval

This paper gives an introduction to Web mining, then describes Web Structure mining in detail, and explores the data structure used by the Web. This paper also explores different Page Rank algorithms and compare those algorithms used for Information Retrieval. In Web Mining, the basics of Web mining and the Web mining categories are explained. Different Page Rank based algorithms like PageRank (PR), WPR (Weighted PageRank), HITS (Hyperlink-Induced Topic Search), DistanceRank and DirichletRank algorithms are discussed and compared. PageRanks are calculated for PageRank and Weighted PageRank algorithms for a given hyperlink structure. Simulation Program is developed for PageRank algorithm because PageRank is the only ranking algorithm implemented in the search engine (Google). The outputs are shown in a table and chart format.




References:
[1] M. G. da Gomes Jr. and Z.Gong, "Web Structure Mining: An
Introduction", Proceedings of the IEEE International Conference on
Information Acquisition, 2005.
[2] N. Duhan, A. K. Sharma and K. K. Bhatia, "Page Ranking Algorithms:
A Survey, Proceedings of the IEEE International Conference on
Advance Computing, 2009.
[3] R. Kosala, H. Blockeel, "Web Mining Research: A Survey", SIGKDD
Explorations, Newsletter of the ACM Special Interest Group on
Knowledge Discovery and Data Mining Vol. 2, No. 1 pp 1-15, 2000.
[4] R. Cooley, B. Mobasher and J. Srivastava, "Web Minig: Information and
Pattern Discovery on the World Wide Web". Proceedings of the 9th
IEEE International Conference on Tools with Artificial Intelligence, pp.
(ICTAI-97), 1997.
[5] http://googleblog.blogspot.com/2008/07.
[6] E. Horowitz, S. Sahni and S. Rajasekaran, "Fundamentals of Computer
Algorithms", Galgotia Publications Pvt. Ltd., pp. 112-118, 2008.
[7] A. Broder, R. Kumar, F Maghoul, P. Raghavan, S. Rajagopalan, R.
Stata, A. Tomkins, J. Wiener, "Graph Structure in the Web", Computer
Networks: The International Journal of Computer and
telecommunications Networking, Vol. 33, Issue 1-6, pp 309-320, 2000.
[8] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tompkins
and E. Upfal, "Web as a Graph", Proceedings of the Nineteenth ACM
SIGMOD-SIGACT-SIGART symposium on Database systems, 2000.
[9] J. Kleinberg, R. Kumar, P. Raghavan, P. Rajagopalan and A. Tompkins,
"Web as a Graph: Measurements, models and methods", Proceedings of
the International Conference on Combinatorics and Computing, pp. 1-
18, 1999.
[10] E. Garfield, "Citation Analysis as a tool in journal evaluation", Science
178, pp. 471-479, 1972.
[11] G. Pinski and F. .Narin, "Citation influence for journal aggregates of
scientific publications: Theory, with application to the literature of
physics", Information Processing and Management, 1976.
[12] D. Gibson, J. Kleinberg, P. Raghavan, "Inferring Web Communities
from Link Topology", Proc. of the 9th ACM Conference on Hypertext
and Hypermedia, 1998.
[13] R. Kumar, P .Raghavan, S .Rajagopalan, A. Tomkins, "Trawling the
Web for Emerging Cyber-Communities", Proc. of the 8th WWW
Conference (WWW8), 1999.
[14] J. Dean and M. Henzinger, "Finding Related Pages in the World Wide
Web", Proc. Eight Int-l World Wide Web Conf., pp. 389-401, 1999.
[15] J. Hou and Y. Zhang, "Effectively Finding Relevant Web Pages from
Linkage Information", IEEE Transactions on Knowledge and Data
Engineering, Vol. 15, No. 4, 2003.
[16] S. Brin, L. Page, "The Anatomy of a Large Scale Hypertextual Web
search engine," Computer Network and ISDN Systems, Vol. 30, Issue 1-
7, pp. 107-117, 1998.
[17] W. Xing and Ali Ghorbani, "Weighted PageRank Algorithm", Proc. of
the Second Annual Conference on Communication Networks and
Services Research (CNSR -04), IEEE, 2004.
[18] J. Kleinberg, "Authoritative Sources in a Hyper-Linked Environment",
Journal of the ACM 46(5), pp. 604-632, 1999.
[19] L. Page, S. Brin, R. Motwani, and T. Winograd, "The Pagerank Citation
Ranking: Bringing order to the Web". Technical Report, Stanford
Digital Libraries SIDL-WP-1999-0120, 1999.
[20] C. Ridings and M. Shishigin, "PageRank Convered". Technical Report,
2002.
[21] J. Kleinberg, "Hubs, Authorities and Communities", ACM Computing
Surveys, 31(4), 1999.
[22] S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, R. Kumar, P.
Raghavan, S. Rajagopalan, A. Tomkins, "Mining the Link Structure of
the World Wide Web", IEEE Computer Society Press, Vol 32, Issue 8
pp. 60 - 67, 1999.
[23] A. M. Zareh Bidoki and N. Yazdani, "DistanceRank: An intelligent
ranking algorithm for web pages" Information Processing and
Management, Vol 44, No. 2, pp. 877-892, 2008.
[24] R.S. Sutton and A.G. Barto, "Reinforcement Learning: An Introduction".
Cambridge, MA: MIT Press, 1998
[25] J. Cho, S. Roy and R. E. Adams, "Page Quality: In search of an unbiased
web ranking". Proc. of ACM International Conference on Management
of Data". Pp. 551-562, 2005.
[26] J. Cho and S. Roy, "Impact of Search Engines on Page Popularity". Proc.
of the 13th International Conference on WWW, pp. 20-29, 2004.
[27] X. Wang, T. Tao, J. T. Sun, A. Shakery and C. Zhai, "DirichletRank:
Solving the Zero-One Gap Problem of PageRank". ACM Transaction on
Information Systems, Vol. 26, Issue 2, 2008.
[28] Z. Gyongyi and H. Garcia-Molina, "Web Spam Taxonomy". Proc. of the
First International Workshop on Adversarial Information Retrieval on
the Web", 2005.
[29] Z.. Gyongyi and H. Garcia-Molina, "Link Spam Alliances". Proc. of the
31st International Conference on Very Large DataBases (VLDB), pp.
517-528, 2005.
[30] M. Bianchini, M.. Gori and F. Scarselli, "Inside PageRank". ACM
Transactions on Internet Technology, Vol. 5, Issue 1, 2005
[31] C.. H. Q. Ding, X. He, P. Husbands, H. Zha and H. D. Simon,
"PageRank: HITS and a Unified Framework for Link Analysis". Proc.
of the 25th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval, 2002.
[32] http://toolbar.google.com/.