Abstract: Web search engines are designed to retrieve and
extract the information in the web databases and to return dynamic
web pages. The Semantic Web is an extension of the current web in
which it includes semantic content in web pages. The main goal of
semantic web is to promote the quality of the current web by
changing its contents into machine understandable form. Therefore,
the milestone of semantic web is to have semantic level information
in the web. Nowadays, people use different keyword- based search
engines to find the relevant information they need from the web.
But many of the words are polysemous. When these words are
used to query a search engine, it displays the Search Result Records
(SRRs) with different meanings. The SRRs with similar meanings are
grouped together based on Word Sense Disambiguation (WSD). In
addition to that semantic annotation is also performed to improve the
efficiency of search result records. Semantic Annotation is the
process of adding the semantic metadata to web resources. Thus the
grouped SRRs are annotated and generate a summary which
describes the information in SRRs. But the automatic semantic
annotation is a significant challenge in the semantic web. Here
ontology and knowledge based representation are used to annotate
the web pages.
Abstract: The rapid expansion of the web is causing the
constant growth of information, leading to several problems such as
increased difficulty of extracting potentially useful knowledge. Web
content mining confronts this problem gathering explicit information
from different web sites for its access and knowledge discovery.
Query interfaces of web databases share common building blocks.
After extracting information with parsing approach, we use a new
data mining algorithm to match a large number of schemas in
databases at a time. Using this algorithm increases the speed of
information matching. In addition, instead of simple 1:1 matching,
they do complex (m:n) matching between query interfaces. In this
paper we present a novel correlation mining algorithm that matches
correlated attributes with smaller cost. This algorithm uses Jaccard
measure to distinguish positive and negative correlated attributes.
After that, system matches the user query with different query
interfaces in special domain and finally chooses the nearest query
interface with user query to answer to it.
Abstract: The explosive growth of World Wide Web has posed
a challenging problem in extracting relevant data. Traditional web
crawlers focus only on the surface web while the deep web keeps
expanding behind the scene. Deep web pages are created
dynamically as a result of queries posed to specific web databases.
The structure of the deep web pages makes it impossible for
traditional web crawlers to access deep web contents. This paper,
Deep iCrawl, gives a novel and vision-based approach for extracting
data from the deep web. Deep iCrawl splits the process into two
phases. The first phase includes Query analysis and Query translation
and the second covers vision-based extraction of data from the
dynamically created deep web pages. There are several established
approaches for the extraction of deep web pages but the proposed
method aims at overcoming the inherent limitations of the former.
This paper also aims at comparing the data items and presenting them
in the required order.