Abstract: Web usage mining is an interesting application of data
mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the
key challenges for web access patterns mining is tackling the problem
of mining richly structured patterns. This paper proposes a novel
model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph
also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more
complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the
search for and representation of concurrency between access patterns
on the web. From experiments conducted on large-scale synthetic
sequence data as well as real web access data, it is demonstrated that
CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.
Abstract: This research uses computational linguistics, an area of study that employs a computer to process natural language, and aims at discerning the patterns that exist in declarative sentences used in technical texts. The approach is mathematical, and the focus is on instructional texts found on web pages. The technique developed by the author and named the MAYA Semantic Technique is used here and organized into four stages. In the first stage, the parts of speech in each sentence are identified. In the second stage, the subject of the sentence is determined. In the third stage, MAYA performs a frequency analysis on the remaining words to determine the verb and its object. In the fourth stage, MAYA does statistical analysis to determine the content of the web page. The advantage of the MAYA Semantic Technique lies in its use of mathematical principles to represent grammatical operations which assist processing and accuracy if performed on unambiguous text. The MAYA Semantic Technique is part of a proposed architecture for an entire web-based intelligent tutoring system. On a sample set of sentences, partial semantics derived using the MAYA Semantic Technique were approximately 80% accurate. The system currently processes technical text in one domain, namely Cµ programming. In this domain all the keywords and programming concepts are known and understood.
Abstract: This article describes a Web pages automatic filtering system. It is an open and dynamic system based on multi agents architecture. This system is built up by a set of agents having each a quite precise filtering task of to carry out (filtering process broken up into several elementary treatments working each one a partial solution). New criteria can be added to the system without stopping its execution or modifying its environment. We want to show applicability and adaptability of the multi-agents approach to the networks information automatic filtering. In practice, most of existing filtering systems are based on modular conception approaches which are limited to centralized applications which role is to resolve static data flow problems. Web pages filtering systems are characterized by a data flow which varies dynamically.
Abstract: In the proposed method for Web page-ranking, a
novel theoretic model is introduced and tested by examples of order
relationships among IP addresses. Ranking is induced using a
convexity feature, which is learned according to these examples
using a self-organizing procedure. We consider the problem of selforganizing
learning from IP data to be represented by a semi-random
convex polygon procedure, in which the vertices correspond to IP
addresses. Based on recent developments in our regularization
theory for convex polygons and corresponding Euclidean distance
based methods for classification, we develop an algorithmic
framework for learning ranking functions based on a Computational
Geometric Theory. We show that our algorithm is generic, and
present experimental results explaining the potential of our approach.
In addition, we explain the generality of our approach by showing its
possible use as a visualization tool for data obtained from diverse
domains, such as Public Administration and Education.
Abstract: Current research on semantic web aims at making intelligent web pages meaningful for machines. In this way, ontology plays a primary role. We believe that logic can help ontology languages (such as OWL) to be more fluent and efficient. In this paper we try to combine logic with OWL to reduce some disadvantages of this language. Therefore we extend OWL by logic and also show how logic can satisfy our future expectations of an ontology language.
Abstract: Information hiding, especially watermarking is a
promising technique for the protection of intellectual property rights.
This technology is mainly advanced for multimedia but the same has
not been done for text. Web pages, like other documents, need a
protection against piracy. In this paper, some techniques are
proposed to show how to hide information in web pages using some
features of the markup language used to describe these pages. Most
of the techniques proposed here use the white space to hide
information or some varieties of the language in representing
elements. Experiments on a very small page and analysis of five
thousands web pages show that these techniques have a wide
bandwidth available for information hiding, and they might form a
solid base to develop a robust algorithm for web page watermarking.
Abstract: With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to identify the relevant pieces of information, since web pages are often cluttered with irrelevant content like advertisements, navigation-panels, copyright notices etc., surrounding the main content of the web page. Hence, tools for the mining of data regions, data records and data items need to be developed in order to provide value-added services. Currently available automatic techniques to mine data regions from web pages are still unsatisfactory because of their poor performance and tag-dependence. In this paper a novel method to extract data items from the web pages automatically is proposed. It comprises of two steps: (1) Identification and Extraction of the data regions based on visual clues information. (2) Identification of data records and extraction of data items from a data region. For step1, a novel and more effective method is proposed based on visual clues, which finds the data regions formed by all types of tags using visual clues. For step2 a more effective method namely, Extraction of Data Items from web Pages (EDIP), is adopted to mine data items. The EDIP technique is a list-based approach in which the list is a linear data structure. The proposed technique is able to mine the non-contiguous data records and can correctly identify data regions, irrespective of the type of tag in which it is bound. Our experimental results show that the proposed technique performs better than the existing techniques.
Abstract: This paper presents a web based remote access
microcontroller laboratory. Because of accelerated development in
electronics and computer technologies, microcontroller-based devices
and appliances are found in all aspects of our daily life. Before the
implementation of remote access microcontroller laboratory an
experiment set is developed by teaching staff for training
microcontrollers. Requirement of technical teaching and industrial
applications are considered when experiment set is designed.
Students can make the experiments by connecting to the experiment
set which is connected to the computer that set as the web server. The
students can program the microcontroller, can control digital and
analog inputs and can observe experiment. Laboratory experiment
web page can be accessed via www.elab.aku.edu.tr address.
Abstract: The internet has become an attractive avenue for
global e-business, e-learning, knowledge sharing, etc. Due to
continuous increase in the volume of web content, it is not practically
possible for a user to extract information by browsing and integrating
data from a huge amount of web sources retrieved by the existing
search engines. The semantic web technology enables advancement
in information extraction by providing a suite of tools to integrate
data from different sources. To take full advantage of semantic web,
it is necessary to annotate existing web pages into semantic web
pages. This research develops a tool, named OWIE (Ontology-based
Web Information Extraction), for semantic web annotation using
domain specific ontologies. The tool automatically extracts
information from html pages with the help of pre-defined ontologies
and gives them semantic representation. Two case studies have been
conducted to analyze the accuracy of OWIE.
Abstract: The recent development of Information and Communication Technology (ICT) enables new ways of "democratic" decision-making such as a page-ranking system, which estimates the importance of a web page based on indirect trust on that page shared by diverse group of unorganized individuals. These kinds of "democracy" have not been acclaimed yet in the world of real politics. On the other hand, a large amount of data about personal relations including trust, norms of reciprocity, and networks of civic engagement has been accumulated in a computer-readable form by computer systems (e.g., social networking systems). We can use these relations as a new type of social capital to construct a new democratic decision-making system based on a delegation network. In this paper, we propose an effective decision-making support system, which is based on empowering someone's vote whom you trust. For this purpose, we propose two new techniques: the first is for estimating entire vote distribution from a small number of votes, and the second is for estimating active voter choice to promote voting using a delegation network. We show that these techniques could increase the voting ratio and credibility of the whole decision by agent-based simulations.
Abstract: An ontology is widely used in many kinds of applications as a knowledge representation tool for domain knowledge. However, even though an ontology schema is well prepared by domain experts, it is tedious and cost-intensive to add instances into the ontology. The most confident and trust-worthy way to add instances into the ontology is to gather instances from tables in the related Web pages. In automatic populating of instances, the primary task is to find the most proper concept among all possible concepts within the ontology for a given table. This paper proposes a novel method for this problem by defining the similarity between the table and the concept using the overlap of their properties. According to a series of experiments, the proposed method achieves 76.98% of accuracy. This implies that the proposed method is a plausible way for automatic ontology population from Web tables.
Abstract: Choosing the right metadata is a critical, as good
information (metadata) attached to an image will facilitate its
visibility from a pile of other images. The image-s value is enhanced
not only by the quality of attached metadata but also by the technique
of the search. This study proposes a technique that is simple but
efficient to predict a single human image from a website using the
basic image data and the embedded metadata of the image-s content
appearing on web pages. The result is very encouraging with the
prediction accuracy of 95%. This technique may become a great
assist to librarians, researchers and many others for automatically and
efficiently identifying a set of human images out of a greater set of
images.
Abstract: In our modern world, more physical transactions are being substituted by electronic transactions (i.e. banking, shopping, and payments), many businesses and companies are performing most of their operations through the internet. Instead of having a physical commerce, internet visitors are now adapting to electronic commerce (e-Commerce). The ability of web users to reach products worldwide can be greatly benefited by creating friendly and personalized online business portals. Internet visitors will return to a particular website when they can find the information they need or want easily. Dealing with this human conceptualization brings the incorporation of Artificial/Computational Intelligence techniques in the creation of customized portals. From these techniques, Fuzzy-Set technologies can make many useful contributions to the development of such a human-centered endeavor as e-Commerce. The main objective of this paper is the implementation of a Paradigm for the Intelligent Design and Operation of Human-Computer interfaces. In particular, the paradigm is quite appropriate for the intelligent design and operation of software modules that display information (such Web Pages, graphic user interfaces GUIs, Multimedia modules) on a computer screen. The human conceptualization of the user personal information is analyzed throughout a Cascaded Fuzzy Inference (decision-making) System to generate the User Ascribe Qualities, which identify the user and that can be used to customize portals with proper Web links.
Abstract: Web 2.0 (social networking, blogging and online
forums) can serve as a data source for social science research because
it contains vast amount of information from many different users.
The volume of that information has been growing at a very high rate
and becoming a network of heterogeneous data; this makes things
difficult to find and is therefore not almost useful. We have proposed
a novel theoretical model for gathering and processing data from
Web 2.0, which would reflect semantic content of web pages in
better way. This article deals with the analysis part of the model and
its usage for content analysis of blogs. The introductory part of the
article describes methodology for the gathering and processing data
from blogs. The next part of the article is focused on the evaluation
and content analysis of blogs, which write about specific trend.
Abstract: With the proliferation of World Wide Web,
development of web-based technologies and the growth in web
content, the structure of a website becomes more complex and web
navigation becomes a critical issue to both web designers and users.
In this paper we define the content and web pages as two important
and influential factors in website navigation and paraphrase the
enhancement in the website navigation as making some useful
changes in the link structure of the website based on the
aforementioned factors. Then we suggest a new method for
proposing the changes using fuzzy approach to optimize the website
architecture. Applying the proposed method to a real case of Iranian
Civil Aviation Organization (CAO) website, we discuss the results of
the novel approach at the final section.
Abstract: This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to solve the Web pages classification problems. We apply Inductive logic programming (ILP) as a strong learner in ICT. The objective of this research is to evaluate the potential of the strong learner in order to boost the performance of the weak learner of ICT. We compare the result with the supervised Naive Bayes, which is the well-known algorithm for the text classification problem. The performance of our learning algorithm is also compare with other semi-supervised learning algorithms which are Co-Training and EM. The experimental results show that ICT algorithm outperforms those algorithms and the performance of the weak learner can be enhanced by ILP system.
Abstract: The explosive growth of World Wide Web has posed
a challenging problem in extracting relevant data. Traditional web
crawlers focus only on the surface web while the deep web keeps
expanding behind the scene. Deep web pages are created
dynamically as a result of queries posed to specific web databases.
The structure of the deep web pages makes it impossible for
traditional web crawlers to access deep web contents. This paper,
Deep iCrawl, gives a novel and vision-based approach for extracting
data from the deep web. Deep iCrawl splits the process into two
phases. The first phase includes Query analysis and Query translation
and the second covers vision-based extraction of data from the
dynamically created deep web pages. There are several established
approaches for the extraction of deep web pages but the proposed
method aims at overcoming the inherent limitations of the former.
This paper also aims at comparing the data items and presenting them
in the required order.
Abstract: The third phase of web means semantic web requires many web pages which are annotated with metadata. Thus, a crucial question is where to acquire these metadata. In this paper we propose our approach, a semi-automatic method to annotate the texts of documents and web pages and employs with a quite comprehensive knowledge base to categorize instances with regard to ontology. The approach is evaluated against the manual annotations and one of the most popular annotation tools which works the same as our tool. The approach is implemented in .net framework and uses the WordNet for knowledge base, an annotation tool for the Semantic Web.
Abstract: With the enormous growth on the web, users get easily
lost in the rich hyper structure. Thus developing user friendly and
automated tools for providing relevant information without any
redundant links to the users to cater to their needs is the primary task
for the website owners. Most of the existing web mining algorithms
have concentrated on finding frequent patterns while neglecting the
less frequent one that are likely to contain the outlying data such as
noise, irrelevant and redundant data. This paper proposes new
algorithm for mining the web content by detecting the redundant
links from the web documents using set theoretical(classical
mathematics) such as subset, union, intersection etc,. Then the
redundant links is removed from the original web content to get the
required information by the user..
Abstract: Advent enhancements in the field of computing have
increased massive use of web based electronic documents. Current
Copyright protection laws are inadequate to prove the ownership for
electronic documents and do not provide strong features against
copying and manipulating information from the web. This has
opened many channels for securing information and significant
evolutions have been made in the area of information security.
Digital Watermarking has developed into a very dynamic area of
research and has addressed challenging issues for digital content.
Watermarking can be visible (logos or signatures) and invisible
(encoding and decoding). Many visible watermarking techniques
have been studied for text documents but there are very few for web
based text. XML files are used to trade information on the internet
and contain important information. In this paper, two invisible
watermarking techniques using Synonyms and Acronyms are
proposed for XML files to prove the intellectual ownership and to
achieve the security. Analysis is made for different attacks and
amount of capacity to be embedded in the XML file is also noticed.
A comparative analysis for capacity is also made for both methods.
The system has been implemented using C# language and all tests are
made practically to get the results.