Comparative Study of Universities’ Web Structure Mining

This paper is meant to analyze the ranking of University of Malaysia Terengganu, UMT’s website in the World Wide Web. There are only few researches have been done on comparing the ranking of universities’ websites so this research will be able to determine whether the existing UMT’s website is serving its purpose which is to introduce UMT to the world. The ranking is based on hub and authority values which are accordance to the structure of the website. These values are computed using two websearching algorithms, HITS and SALSA. Three other universities’ websites are used as the benchmarks which are UM, Harvard and Stanford. The result is clearly showing that more work has to be done on the existing UMT’s website where important pages according to the benchmarks, do not exist in UMT’s pages. The ranking of UMT’s website will act as a guideline for the web-developer to develop a more efficient website.




References:
[1] A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan,
“Searching the Web,” ACM Transactions on Internet Technology
(TOIT), 1 (1), pp. 2-43, 2001.
[2] B.J. Jansen, A. Spink, C. Blakely, and S. Koshman, “Defining a Session
on Web Search Engines,” Journal of the American Society for
Information Science and Technology, 58(6), pp. 862–871, 2007.
[3] J. Srivastava, P. Desikan, and V. Kumar, “Web Mining-
Accomplishments & Future Directions,” University of Minnesota. 2000
[4] J. Fürnkranz, “Web Mining,” Data Mining and Knowledge Discovery
Handbook, pp. 913-930, Springer-Verlag, 2010.
[5] M. Eirinaki, “Web Mining: A Roadmap,” Technical Report, DB-NET
2004, at http://www.engr.sjsu.edu/meirinaki/papers/NEMIS.pdf
[6] J. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,”
Proceeding of the 9th ACM SIAM Symposium on Discrete Algorithms,
pp. 668–677, 1998.
[7] M. Lan, “Algorithms and Applications of Preference Based Ranking for
Information Retrieval,” Ph.D Thesis, 2005.
[8] M. Najork, “Comparing the Effectiveness of HITS and SALSA,”
Proceeding of 16th ACM Conference on Information and Knowledge
Management (CIKM), 2007.
[9] R. Lempel, and S. Moran, “Rank-Stability and Rank-Similarity of Link-
Based Web Ranking Algorithms in Authority-Connected Graphs,”
Information Retrieval, pp. 245-264, 2005.
[10] Y Duan, J Wang, M Kam, J Canny, “Privacy preserving link analysis on
dynamic weighted graph,” Computational & Mathematical Organization
Theory, 11 (2), 141-159, 2005.
[11] Z. Chen, L. Tao, J. Wang, L. Wenyin, and W. Ma, “A Unified
Framework for Web Link Analysis,” Proc. 3rd International Conference
on Web Information Systems Engineering (WISE2002), Singapore
(regular paper), pp. 63-72, Dec 2002.
[12] A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas, “Finding
Authorities and Hubs from Link Structures on the World WideWeb,”
Proceedings of the 10th International World Wide Web Conference, pp.
415-429, 2001.
[13] A.N. Langville, and C.D. Meyer, “A Survey of Eigenvector Methods for
Web Information Retrieval,” Journal SIAM review, 47(1), 135-161,
2005.
[14] A. Farahat, T. LoFaro, J.C. Miller, G. Rae, L.A. Ward, “Authority
rankings from HITS, PageRank, and SALSA: Existence, uniqueness,
and effect of initialization,” SIAM Journal on Scientific Computing, 27
(4), 1181-1201, 2006.
[15] J.C. Miller, G. Rae, and F. Schaefer, “Modifications of Kleinberg’s
HITS Algorithm Using Matrix Exponentiation and Web Log Records,”
Proceedings of the 24th annual international ACM SIGIR conference on
Research and development in information retrieval, pp. 444-454, 2001.