Abstract: The internet has become an attractive avenue for
global e-business, e-learning, knowledge sharing, etc. Due to
continuous increase in the volume of web content, it is not practically
possible for a user to extract information by browsing and integrating
data from a huge amount of web sources retrieved by the existing
search engines. The semantic web technology enables advancement
in information extraction by providing a suite of tools to integrate
data from different sources. To take full advantage of semantic web,
it is necessary to annotate existing web pages into semantic web
pages. This research develops a tool, named OWIE (Ontology-based
Web Information Extraction), for semantic web annotation using
domain specific ontologies. The tool automatically extracts
information from html pages with the help of pre-defined ontologies
and gives them semantic representation. Two case studies have been
conducted to analyze the accuracy of OWIE.
Abstract: An ontology is widely used in many kinds of applications as a knowledge representation tool for domain knowledge. However, even though an ontology schema is well prepared by domain experts, it is tedious and cost-intensive to add instances into the ontology. The most confident and trust-worthy way to add instances into the ontology is to gather instances from tables in the related Web pages. In automatic populating of instances, the primary task is to find the most proper concept among all possible concepts within the ontology for a given table. This paper proposes a novel method for this problem by defining the similarity between the table and the concept using the overlap of their properties. According to a series of experiments, the proposed method achieves 76.98% of accuracy. This implies that the proposed method is a plausible way for automatic ontology population from Web tables.
Abstract: Choosing the right metadata is a critical, as good
information (metadata) attached to an image will facilitate its
visibility from a pile of other images. The image-s value is enhanced
not only by the quality of attached metadata but also by the technique
of the search. This study proposes a technique that is simple but
efficient to predict a single human image from a website using the
basic image data and the embedded metadata of the image-s content
appearing on web pages. The result is very encouraging with the
prediction accuracy of 95%. This technique may become a great
assist to librarians, researchers and many others for automatically and
efficiently identifying a set of human images out of a greater set of
images.
Abstract: In our modern world, more physical transactions are being substituted by electronic transactions (i.e. banking, shopping, and payments), many businesses and companies are performing most of their operations through the internet. Instead of having a physical commerce, internet visitors are now adapting to electronic commerce (e-Commerce). The ability of web users to reach products worldwide can be greatly benefited by creating friendly and personalized online business portals. Internet visitors will return to a particular website when they can find the information they need or want easily. Dealing with this human conceptualization brings the incorporation of Artificial/Computational Intelligence techniques in the creation of customized portals. From these techniques, Fuzzy-Set technologies can make many useful contributions to the development of such a human-centered endeavor as e-Commerce. The main objective of this paper is the implementation of a Paradigm for the Intelligent Design and Operation of Human-Computer interfaces. In particular, the paradigm is quite appropriate for the intelligent design and operation of software modules that display information (such Web Pages, graphic user interfaces GUIs, Multimedia modules) on a computer screen. The human conceptualization of the user personal information is analyzed throughout a Cascaded Fuzzy Inference (decision-making) System to generate the User Ascribe Qualities, which identify the user and that can be used to customize portals with proper Web links.
Abstract: Web 2.0 (social networking, blogging and online
forums) can serve as a data source for social science research because
it contains vast amount of information from many different users.
The volume of that information has been growing at a very high rate
and becoming a network of heterogeneous data; this makes things
difficult to find and is therefore not almost useful. We have proposed
a novel theoretical model for gathering and processing data from
Web 2.0, which would reflect semantic content of web pages in
better way. This article deals with the analysis part of the model and
its usage for content analysis of blogs. The introductory part of the
article describes methodology for the gathering and processing data
from blogs. The next part of the article is focused on the evaluation
and content analysis of blogs, which write about specific trend.
Abstract: With the proliferation of World Wide Web,
development of web-based technologies and the growth in web
content, the structure of a website becomes more complex and web
navigation becomes a critical issue to both web designers and users.
In this paper we define the content and web pages as two important
and influential factors in website navigation and paraphrase the
enhancement in the website navigation as making some useful
changes in the link structure of the website based on the
aforementioned factors. Then we suggest a new method for
proposing the changes using fuzzy approach to optimize the website
architecture. Applying the proposed method to a real case of Iranian
Civil Aviation Organization (CAO) website, we discuss the results of
the novel approach at the final section.
Abstract: This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to solve the Web pages classification problems. We apply Inductive logic programming (ILP) as a strong learner in ICT. The objective of this research is to evaluate the potential of the strong learner in order to boost the performance of the weak learner of ICT. We compare the result with the supervised Naive Bayes, which is the well-known algorithm for the text classification problem. The performance of our learning algorithm is also compare with other semi-supervised learning algorithms which are Co-Training and EM. The experimental results show that ICT algorithm outperforms those algorithms and the performance of the weak learner can be enhanced by ILP system.
Abstract: The explosive growth of World Wide Web has posed
a challenging problem in extracting relevant data. Traditional web
crawlers focus only on the surface web while the deep web keeps
expanding behind the scene. Deep web pages are created
dynamically as a result of queries posed to specific web databases.
The structure of the deep web pages makes it impossible for
traditional web crawlers to access deep web contents. This paper,
Deep iCrawl, gives a novel and vision-based approach for extracting
data from the deep web. Deep iCrawl splits the process into two
phases. The first phase includes Query analysis and Query translation
and the second covers vision-based extraction of data from the
dynamically created deep web pages. There are several established
approaches for the extraction of deep web pages but the proposed
method aims at overcoming the inherent limitations of the former.
This paper also aims at comparing the data items and presenting them
in the required order.
Abstract: The third phase of web means semantic web requires many web pages which are annotated with metadata. Thus, a crucial question is where to acquire these metadata. In this paper we propose our approach, a semi-automatic method to annotate the texts of documents and web pages and employs with a quite comprehensive knowledge base to categorize instances with regard to ontology. The approach is evaluated against the manual annotations and one of the most popular annotation tools which works the same as our tool. The approach is implemented in .net framework and uses the WordNet for knowledge base, an annotation tool for the Semantic Web.
Abstract: With the enormous growth on the web, users get easily
lost in the rich hyper structure. Thus developing user friendly and
automated tools for providing relevant information without any
redundant links to the users to cater to their needs is the primary task
for the website owners. Most of the existing web mining algorithms
have concentrated on finding frequent patterns while neglecting the
less frequent one that are likely to contain the outlying data such as
noise, irrelevant and redundant data. This paper proposes new
algorithm for mining the web content by detecting the redundant
links from the web documents using set theoretical(classical
mathematics) such as subset, union, intersection etc,. Then the
redundant links is removed from the original web content to get the
required information by the user..
Abstract: People from different cultures favor web pages
characterized by the values of their culture and, therefore, tend to
prefer different characteristics of a website according to their cultural
values in terms of navigation, security, product information, customer
service, shopping and design tools. For a company aiming to
globalize its market it is useful to implement country specific cultural
interfaces and different web sites for countries with different cultures.
This paper, following the conclusions proposed by two models of
Hall and Hofstede, and the studies of Marcus and Gould, defines,
through an empirical analysis, the guidelines of web design for both
the Scandinavian countries and Malaysia.
Abstract: This paper has as its main aim to analyse how
corporate web pages can become an essential tool in order to detect
strategic trends by firms or sectors, and even a primary source for
benchmarking. This technique has made it possible to identify the key
issues in the strategic management of the most excellent large Spanish
firms and also to describe trends in their long-range planning, a way of
working that can be generalised to any country or firm group. More
precisely, two objectives were sought. The first one consisted in showing
the way in which corporate websites make it possible to obtain direct
information about the strategic variables which can define firms. This
tool is dynamic (since web pages are constantly updated) as well as
direct and reliable, since the information comes from the firm itself, not
from comments of third parties (such as journalists, academicians,
consultants...). When this information is analysed for a group of firms,
one can observe their characteristics in terms of both managerial tasks
and business management. As for the second objective, the methodology
proposed served to describe the corporate profile of the large Spanish
enterprises included in the Ibex35 (the Ibex35 or Iberia Index is the
reference index in the Spanish Stock Exchange and gathers periodically
the 35 most outstanding Spanish firms). An attempt is therefore made to
define the long-range planning that would be characteristic of the largest
Spanish firms.
Abstract: In order to make surfing the internet faster, and to save redundant processing load with each request for the same web page, many caching techniques have been developed to reduce latency of retrieving data on World Wide Web. In this paper we will give a quick overview of existing web caching techniques used for dynamic web pages then we will introduce a design and implementation model that take advantage of “URL Rewriting" feature in some popular web servers, e.g. Apache, to provide an effective approach of caching dynamic web pages.
Abstract: The ever increasing use of World Wide Web in the
existing network, results in poor performance. Several techniques
have been developed for reducing web traffic by compressing the size
of the file, saving the web pages at the client side, changing the burst
nature of traffic into constant rate etc. No single method was
adequate enough to access the document instantly through the
Internet. In this paper, adaptive hybrid algorithms are developed for
reducing web traffic. Intelligent agents are used for monitoring the
web traffic. Depending upon the bandwidth usage, user-s preferences,
server and browser capabilities, intelligent agents use the best
techniques to achieve maximum traffic reduction. Web caching,
compression, filtering, optimization of HTML tags, and traffic
dispersion are incorporated into this adaptive selection. Using this
new hybrid technique, latency is reduced to 20 – 60 % and cache hit
ratio is increased 40 – 82 %.