Abstract: The various types of frequent pattern discovery
problem, namely, the frequent itemset, sequence and graph mining
problems are solved in different ways which are, however, in certain
aspects similar. The main approach of discovering such patterns can
be classified into two main classes, namely, in the class of the levelwise
methods and in that of the database projection-based methods.
The level-wise algorithms use in general clever indexing structures
for discovering the patterns. In this paper a new approach is proposed
for discovering frequent sequences and tree-like patterns efficiently
that is based on the level-wise issue. Because the level-wise
algorithms spend a lot of time for the subpattern testing problem, the
new approach introduces the idea of using automaton theory to solve
this problem.
Abstract: This paper describes an enhanced cookie-based
method for counting the visitors of web sites by using a web log
processing system that aims to cope with the ambitious goal of
creating countrywide statistics about the browsing practices of real
human individuals. The focus is put on describing a new more
efficient way of detecting human beings behind web users by placing
different identifiers on the client computers. We briefly introduce our
processing system designed to handle the massive amount of data
records continuously gathered from the most important content
providers of the Hungary. We conclude by showing statistics of
different time spans comparing the efficiency of multiple visitor
counting methods to the one presented here, and some interesting
charts about content providers and web usage based on real data
recorded in 2007 will also be presented.
Abstract: Web usage mining has become a popular research
area, as a huge amount of data is available online. These data can be
used for several purposes, such as web personalization, web structure
enhancement, web navigation prediction etc. However, the raw log
files are not directly usable; they have to be preprocessed in order to
transform them into a suitable format for different data mining tasks.
One of the key issues in the preprocessing phase is to identify web
users. Identifying users based on web log files is not a
straightforward problem, thus various methods have been developed.
There are several difficulties that have to be overcome, such as client
side caching, changing and shared IP addresses and so on. This paper
presents three different methods for identifying web users. Two of
them are the most commonly used methods in web log mining
systems, whereas the third on is our novel approach that uses a
complex cookie-based method to identify web users. Furthermore we
also take steps towards identifying the individuals behind the
impersonal web users. To demonstrate the efficiency of the new
method we developed an implementation called Web Activity
Tracking (WAT) system that aims at a more precise distinction of
web users based on log data. We present some statistical analysis
created by the WAT on real data about the behavior of the Hungarian
web users and a comprehensive analysis and comparison of the three
methods