Tracking Activity of Real Individuals in Web Logs

This paper describes an enhanced cookie-based method for counting the visitors of web sites by using a web log processing system that aims to cope with the ambitious goal of creating countrywide statistics about the browsing practices of real human individuals. The focus is put on describing a new more efficient way of detecting human beings behind web users by placing different identifiers on the client computers. We briefly introduce our processing system designed to handle the massive amount of data records continuously gathered from the most important content providers of the Hungary. We conclude by showing statistics of different time spans comparing the efficiency of multiple visitor counting methods to the one presented here, and some interesting charts about content providers and web usage based on real data recorded in 2007 will also be presented.




References:
[1] Z. Pabarskaite, A. Raudys, "A process of knowledge discovery from
web log data: Systematization and critical review." Journal of
Intelligent Information Systems 28(1): pp. 79-104, 2007.
[2] Median Public Opinion and Market Research Institute.
http://www.median.hu/ and http://www.webaudit.hu/
[3] R. Kosala, H. Blockeel, "Web mining research: A survey." ACM
SIGKDD Explorations, 1, pp. 1-15, 2000.
[4] W3C, Common Log Format,
http://www.w3.org/Daemon/User/Config/Logging.html
[5] G. Fleishman, "Web log analysis, who-s doing what, when?" Web
Developer. 1996.
[6] M. Spiliopoulou, "Managing interesting rules in sequence mining." 3rd
European Conference on Principles and Practice of Knowledge
Discovery in Databases PKDD-99. Prague, Czech Republic: Springer-
Verlag. 1999.
[7] H. Ishikawa, M. Ohta, Sh. Yokoyama, J. Nakayama, K. Katayama, "On
The Effectiveness of Web Usage Mining for Page Recommendation and
Restructuring," Lecture Notes In Computer Science; Vol. 2593, pp: 253-
267.
[8] S. Baron, M. Spiliopoulou, "Monitoring the Evolution of Web Usage
Patterns, Web Mining: From Web to Semantic Web," First European
Web Mining Forum, (EMWF 2003), Cavtat-Dubrovnik, Croatia,
September, pp. 181-200, 2003.
[9] M. Spiliopoulou, C. Pohle, L. C. Faulstich, "Improving the Effectiveness
of a Web Site with Web Usage Mining, International Workshop on Web
Usage Analysis and User Profiling," WEBKDD, pp. 142-162. 2000.
[10] M. Spiliopoulou, B. Mobasher, B. Berendt, M. Nakagawa, "A
Framework for the Evaluation of session reconstruction heuristics in
Web-usage analysis." INFORMS Journal on Computing 15: pp. 171-
190, 2003.
[11] L. D. Catledge, J. E. Pitkow, "Characterizing browsing strategies in the
world-wide web." Computer Networks and ISDN Systems, 6, 10-65,
1995.
[12] R. Cooley, P. Tan, J. Srivastava, "Discovery of interesting usage patterns
from Web data." B. Masand, M. Spiliopoulou, eds. Advances in Web
Usage Analysis and User Profiling. LNAI 1836, Springer, Berlin,
Germany. 163-182, 2000.
[13] Brandt Dainow, "3rd Party Cookies Are Dead," Web Analytics
Associations, 2005. http://www.webanalyticsassociation.org/en/art/?2
[14] "WebTrends Advises Sites to Move to First-Party Cookies Based on
Four-Fold Increase in Third-Party Cookie Rejection Rates," WebTrends,
2005. http://www.webtrends.com/CookieRejection.