Abstract: This paper gives an introduction to Web mining, then
describes Web Structure mining in detail, and explores the data
structure used by the Web. This paper also explores different Page
Rank algorithms and compare those algorithms used for Information
Retrieval. In Web Mining, the basics of Web mining and the Web
mining categories are explained. Different Page Rank based
algorithms like PageRank (PR), WPR (Weighted PageRank), HITS
(Hyperlink-Induced Topic Search), DistanceRank and DirichletRank
algorithms are discussed and compared. PageRanks are calculated for
PageRank and Weighted PageRank algorithms for a given hyperlink
structure. Simulation Program is developed for PageRank algorithm
because PageRank is the only ranking algorithm implemented in the
search engine (Google). The outputs are shown in a table and chart
format.
Abstract: XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.