Abstract: The rapid expansion of the web is causing the
constant growth of information, leading to several problems such as
increased difficulty of extracting potentially useful knowledge. Web
content mining confronts this problem gathering explicit information
from different web sites for its access and knowledge discovery.
Query interfaces of web databases share common building blocks.
After extracting information with parsing approach, we use a new
data mining algorithm to match a large number of schemas in
databases at a time. Using this algorithm increases the speed of
information matching. In addition, instead of simple 1:1 matching,
they do complex (m:n) matching between query interfaces. In this
paper we present a novel correlation mining algorithm that matches
correlated attributes with smaller cost. This algorithm uses Jaccard
measure to distinguish positive and negative correlated attributes.
After that, system matches the user query with different query
interfaces in special domain and finally chooses the nearest query
interface with user query to answer to it.
Abstract: The internet has become an attractive avenue for
global e-business, e-learning, knowledge sharing, etc. Due to
continuous increase in the volume of web content, it is not practically
possible for a user to extract information by browsing and integrating
data from a huge amount of web sources retrieved by the existing
search engines. The semantic web technology enables advancement
in information extraction by providing a suite of tools to integrate
data from different sources. To take full advantage of semantic web,
it is necessary to annotate existing web pages into semantic web
pages. This research develops a tool, named OWIE (Ontology-based
Web Information Extraction), for semantic web annotation using
domain specific ontologies. The tool automatically extracts
information from html pages with the help of pre-defined ontologies
and gives them semantic representation. Two case studies have been
conducted to analyze the accuracy of OWIE.