Abstract: One of the significant issues facing web users is the amount of noise in web data which hinders the process of finding useful information in relation to their dynamic interests. Current research works consider noise as any data that does not form part of the main web page and propose noise web data reduction tools which mainly focus on eliminating noise in relation to the content and layout of web data. This paper argues that not all data that form part of the main web page is of a user interest and not all noise data is actually noise to a given user. Therefore, learning of noise web data allocated to the user requests ensures not only reduction of noisiness level in a web user profile, but also a decrease in the loss of useful information hence improves the quality of a web user profile. Noise Web Data Learning (NWDL) tool/algorithm capable of learning noise web data in web user profile is proposed. The proposed work considers elimination of noise data in relation to dynamic user interest. In order to validate the performance of the proposed work, an experimental design setup is presented. The results obtained are compared with the current algorithms applied in noise web data reduction process. The experimental results show that the proposed work considers the dynamic change of user interest prior to elimination of noise data. The proposed work contributes towards improving the quality of a web user profile by reducing the amount of useful information eliminated as noise.
Abstract: Finding relevant information on the World Wide Web is becoming highly challenging day by day. Web usage mining is used for the extraction of relevant and useful knowledge, such as user behaviour patterns, from web access log records. Web access log records all the requests for individual files that the users have requested from the website. Web usage mining is important for Customer Relationship Management (CRM), as it can ensure customer satisfaction as far as the interaction between the customer and the organization is concerned. Web usage mining is helpful in improving website structure or design as per the user’s requirement by analyzing the access log file of a website through a log analyzer tool. The focus of this paper is to enhance the accessibility and usability of a guitar selling web site by analyzing their access log through Deep Log Analyzer tool. The results show that the maximum number of users is from the United States and that they use Opera 9.8 web browser and the Windows XP operating system.
Abstract: Web Usage Mining is the application of data mining
techniques to find usage patterns from web log data, so as to grasp
required patterns and serve the requirements of Web-based
applications. User’s expertise on the internet may be improved by
minimizing user’s web access latency. This may be done by
predicting the future search page earlier and the same may be prefetched
and cached. Therefore, to enhance the standard of web
services, it is needed topic to research the user web navigation
behavior. Analysis of user’s web navigation behavior is achieved
through modeling web navigation history. We propose this technique
which cluster’s the user sessions, based on the K-medoids technique.
Abstract: Web mining is to discover and extract useful
Information. Different users may have different search goals when
they search by giving queries and submitting it to a search engine.
The inference and analysis of user search goals can be very useful for
providing an experience result for a user search query. In this project,
we propose a novel approach to infer user search goals by analyzing
search web logs. First, we propose a novel approach to infer user
search goals by analyzing search engine query logs, the feedback
sessions are constructed from user click-through logs and it
efficiently reflect the information needed for users. Second we
propose a preprocessing technique to clean the unnecessary data’s
from web log file (feedback session). Third we propose a technique
to generate pseudo-documents to representation of feedback sessions
for clustering. Finally we implement k-medoids clustering algorithm
to discover different user search goals and to provide a more optimal
result for a search query based on feedback sessions for the user.
Abstract: Frequent pattern mining is the process of finding a
pattern (a set of items, subsequences, substructures, etc.) that occurs
frequently in a data set. It was proposed in the context of frequent
itemsets and association rule mining. Frequent pattern mining is used
to find inherent regularities in data. What products were often
purchased together? Its applications include basket data analysis,
cross-marketing, catalog design, sale campaign analysis, Web log
(click stream) analysis, and DNA sequence analysis. However, one of
the bottlenecks of frequent itemset mining is that as the data increase
the amount of time and resources required to mining the data
increases at an exponential rate. In this investigation a new algorithm
is proposed which can be uses as a pre-processor for frequent itemset
mining. FASTER (FeAture SelecTion using Entropy and Rough sets)
is a hybrid pre-processor algorithm which utilizes entropy and roughsets
to carry out record reduction and feature (attribute) selection
respectively. FASTER for frequent itemset mining can produce a
speed up of 3.1 times when compared to original algorithm while
maintaining an accuracy of 71%.
Abstract: With the explosive growth of data available on the
Internet, personalization of this information space become a
necessity. At present time with the rapid increasing popularity of the
WWW, Websites are playing a crucial role to convey knowledge and
information to the end users. Discovering hidden and meaningful
information about Web users usage patterns is critical to determine
effective marketing strategies to optimize the Web server usage for
accommodating future growth. The task of mining useful information
becomes more challenging when the Web traffic volume is enormous
and keeps on growing. In this paper, we propose a intelligent model
to discover and analyze useful knowledge from the available Web
log data.
Abstract: This paper sets forth the possibility and importance about applying Data Mining in Web logs mining and shows some problems in the conventional searching engines. Then it offers an improved algorithm based on the original AprioriAll algorithm which has been used in Web logs mining widely. The new algorithm adds the property of the User ID during the every step of producing the candidate set and every step of scanning the database by which to decide whether an item in the candidate set should be put into the large set which will be used to produce next candidate set. At the meantime, in order to reduce the number of the database scanning, the new algorithm, by using the property of the Apriori algorithm, limits the size of the candidate set in time whenever it is produced. Test results show the improved algorithm has a more lower complexity of time and space, better restrain noise and fit the capacity of memory.
Abstract: This paper describes an enhanced cookie-based
method for counting the visitors of web sites by using a web log
processing system that aims to cope with the ambitious goal of
creating countrywide statistics about the browsing practices of real
human individuals. The focus is put on describing a new more
efficient way of detecting human beings behind web users by placing
different identifiers on the client computers. We briefly introduce our
processing system designed to handle the massive amount of data
records continuously gathered from the most important content
providers of the Hungary. We conclude by showing statistics of
different time spans comparing the efficiency of multiple visitor
counting methods to the one presented here, and some interesting
charts about content providers and web usage based on real data
recorded in 2007 will also be presented.
Abstract: Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.
Abstract: Web usage mining has become a popular research
area, as a huge amount of data is available online. These data can be
used for several purposes, such as web personalization, web structure
enhancement, web navigation prediction etc. However, the raw log
files are not directly usable; they have to be preprocessed in order to
transform them into a suitable format for different data mining tasks.
One of the key issues in the preprocessing phase is to identify web
users. Identifying users based on web log files is not a
straightforward problem, thus various methods have been developed.
There are several difficulties that have to be overcome, such as client
side caching, changing and shared IP addresses and so on. This paper
presents three different methods for identifying web users. Two of
them are the most commonly used methods in web log mining
systems, whereas the third on is our novel approach that uses a
complex cookie-based method to identify web users. Furthermore we
also take steps towards identifying the individuals behind the
impersonal web users. To demonstrate the efficiency of the new
method we developed an implementation called Web Activity
Tracking (WAT) system that aims at a more precise distinction of
web users based on log data. We present some statistical analysis
created by the WAT on real data about the behavior of the Hungarian
web users and a comprehensive analysis and comparison of the three
methods