Abstract: Search engine plays an important role in internet, to
retrieve the relevant documents among the huge number of web
pages. However, it retrieves more number of documents, which are
all relevant to your search topics. To retrieve the most meaningful
documents related to search topics, ranking algorithm is used in
information retrieval technique. One of the issues in data miming is
ranking the retrieved document. In information retrieval the ranking
is one of the practical problems. This paper includes various Page
Ranking algorithms, page segmentation algorithms and compares
those algorithms used for Information Retrieval. Diverse Page Rank
based algorithms like Page Rank (PR), Weighted Page Rank (WPR),
Weight Page Content Rank (WPCR), Hyperlink Induced Topic
Selection (HITS), Distance Rank, Eigen Rumor, Distance Rank Time
Rank, Tag Rank, Relational Based Page Rank and Query Dependent
Ranking algorithms are discussed and compared.
Abstract: This paper describes fast and efficient method for page segmentation of document containing nonrectangular block. The segmentation is based on edge following algorithm using small window of 16 by 32 pixels. This segmentation is very fast since only border pixels of paragraph are used without scanning the whole page. Still, the segmentation may contain error if the space between them is smaller than the window used in edge following. Consequently, this paper reduce this error by first identify the missed segmentation point using direction information in edge following then, using X-Y cut at the missed segmentation point to separate the connected columns. The advantage of the proposed method is the fast identification of missed segmentation point. This methodology is faster with fewer overheads than other algorithms that need to access much more pixel of a document.