This paper describes an enhanced cookie-based
method for counting the visitors of web sites by using a web log
processing system that aims to cope with the ambitious goal of
creating countrywide statistics about the browsing practices of real
human individuals. The focus is put on describing a new more
efficient way of detecting human beings behind web users by placing
different identifiers on the client computers. We briefly introduce our
processing system designed to handle the massive amount of data
records continuously gathered from the most important content
providers of the Hungary. We conclude by showing statistics of
different time spans comparing the efficiency of multiple visitor
counting methods to the one presented here, and some interesting
charts about content providers and web usage based on real data
recorded in 2007 will also be presented.
[1] Z. Pabarskaite, A. Raudys, "A process of knowledge discovery from
web log data: Systematization and critical review." Journal of
Intelligent Information Systems 28(1): pp. 79-104, 2007.
[2] Median Public Opinion and Market Research Institute.
http://www.median.hu/ and http://www.webaudit.hu/
[3] R. Kosala, H. Blockeel, "Web mining research: A survey." ACM
SIGKDD Explorations, 1, pp. 1-15, 2000.
[4] W3C, Common Log Format,
http://www.w3.org/Daemon/User/Config/Logging.html
[5] G. Fleishman, "Web log analysis, who-s doing what, when?" Web
Developer. 1996.
[6] M. Spiliopoulou, "Managing interesting rules in sequence mining." 3rd
European Conference on Principles and Practice of Knowledge
Discovery in Databases PKDD-99. Prague, Czech Republic: Springer-
Verlag. 1999.
[7] H. Ishikawa, M. Ohta, Sh. Yokoyama, J. Nakayama, K. Katayama, "On
The Effectiveness of Web Usage Mining for Page Recommendation and
Restructuring," Lecture Notes In Computer Science; Vol. 2593, pp: 253-
267.
[8] S. Baron, M. Spiliopoulou, "Monitoring the Evolution of Web Usage
Patterns, Web Mining: From Web to Semantic Web," First European
Web Mining Forum, (EMWF 2003), Cavtat-Dubrovnik, Croatia,
September, pp. 181-200, 2003.
[9] M. Spiliopoulou, C. Pohle, L. C. Faulstich, "Improving the Effectiveness
of a Web Site with Web Usage Mining, International Workshop on Web
Usage Analysis and User Profiling," WEBKDD, pp. 142-162. 2000.
[10] M. Spiliopoulou, B. Mobasher, B. Berendt, M. Nakagawa, "A
Framework for the Evaluation of session reconstruction heuristics in
Web-usage analysis." INFORMS Journal on Computing 15: pp. 171-
190, 2003.
[11] L. D. Catledge, J. E. Pitkow, "Characterizing browsing strategies in the
world-wide web." Computer Networks and ISDN Systems, 6, 10-65,
1995.
[12] R. Cooley, P. Tan, J. Srivastava, "Discovery of interesting usage patterns
from Web data." B. Masand, M. Spiliopoulou, eds. Advances in Web
Usage Analysis and User Profiling. LNAI 1836, Springer, Berlin,
Germany. 163-182, 2000.
[13] Brandt Dainow, "3rd Party Cookies Are Dead," Web Analytics
Associations, 2005. http://www.webanalyticsassociation.org/en/art/?2
[14] "WebTrends Advises Sites to Move to First-Party Cookies Based on
Four-Fold Increase in Third-Party Cookie Rejection Rates," WebTrends,
2005. http://www.webtrends.com/CookieRejection.
[1] Z. Pabarskaite, A. Raudys, "A process of knowledge discovery from
web log data: Systematization and critical review." Journal of
Intelligent Information Systems 28(1): pp. 79-104, 2007.
[2] Median Public Opinion and Market Research Institute.
http://www.median.hu/ and http://www.webaudit.hu/
[3] R. Kosala, H. Blockeel, "Web mining research: A survey." ACM
SIGKDD Explorations, 1, pp. 1-15, 2000.
[4] W3C, Common Log Format,
http://www.w3.org/Daemon/User/Config/Logging.html
[5] G. Fleishman, "Web log analysis, who-s doing what, when?" Web
Developer. 1996.
[6] M. Spiliopoulou, "Managing interesting rules in sequence mining." 3rd
European Conference on Principles and Practice of Knowledge
Discovery in Databases PKDD-99. Prague, Czech Republic: Springer-
Verlag. 1999.
[7] H. Ishikawa, M. Ohta, Sh. Yokoyama, J. Nakayama, K. Katayama, "On
The Effectiveness of Web Usage Mining for Page Recommendation and
Restructuring," Lecture Notes In Computer Science; Vol. 2593, pp: 253-
267.
[8] S. Baron, M. Spiliopoulou, "Monitoring the Evolution of Web Usage
Patterns, Web Mining: From Web to Semantic Web," First European
Web Mining Forum, (EMWF 2003), Cavtat-Dubrovnik, Croatia,
September, pp. 181-200, 2003.
[9] M. Spiliopoulou, C. Pohle, L. C. Faulstich, "Improving the Effectiveness
of a Web Site with Web Usage Mining, International Workshop on Web
Usage Analysis and User Profiling," WEBKDD, pp. 142-162. 2000.
[10] M. Spiliopoulou, B. Mobasher, B. Berendt, M. Nakagawa, "A
Framework for the Evaluation of session reconstruction heuristics in
Web-usage analysis." INFORMS Journal on Computing 15: pp. 171-
190, 2003.
[11] L. D. Catledge, J. E. Pitkow, "Characterizing browsing strategies in the
world-wide web." Computer Networks and ISDN Systems, 6, 10-65,
1995.
[12] R. Cooley, P. Tan, J. Srivastava, "Discovery of interesting usage patterns
from Web data." B. Masand, M. Spiliopoulou, eds. Advances in Web
Usage Analysis and User Profiling. LNAI 1836, Springer, Berlin,
Germany. 163-182, 2000.
[13] Brandt Dainow, "3rd Party Cookies Are Dead," Web Analytics
Associations, 2005. http://www.webanalyticsassociation.org/en/art/?2
[14] "WebTrends Advises Sites to Move to First-Party Cookies Based on
Four-Fold Increase in Third-Party Cookie Rejection Rates," WebTrends,
2005. http://www.webtrends.com/CookieRejection.
@article{"International Journal of Information, Control and Computer Sciences:58162", author = "Sándor Juhász and Renáta Iváncsy", title = "Tracking Activity of Real Individuals in Web Logs", abstract = "This paper describes an enhanced cookie-based
method for counting the visitors of web sites by using a web log
processing system that aims to cope with the ambitious goal of
creating countrywide statistics about the browsing practices of real
human individuals. The focus is put on describing a new more
efficient way of detecting human beings behind web users by placing
different identifiers on the client computers. We briefly introduce our
processing system designed to handle the massive amount of data
records continuously gathered from the most important content
providers of the Hungary. We conclude by showing statistics of
different time spans comparing the efficiency of multiple visitor
counting methods to the one presented here, and some interesting
charts about content providers and web usage based on real data
recorded in 2007 will also be presented.", keywords = "Cookie based identification, real data, user activitytracking, web auditing, web log processing", volume = "1", number = "7", pages = "2101-6", }