Concurrency in Web Access Patterns Mining

Web usage mining is an interesting application of data mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the key challenges for web access patterns mining is tackling the problem of mining richly structured patterns. This paper proposes a novel model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the search for and representation of concurrency between access patterns on the web. From experiments conducted on large-scale synthetic sequence data as well as real web access data, it is demonstrated that CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.




References:
[1] B. Liu, Web Data Mining - Exploring hyperlinks, contents and usage data. Book series: Data-Centric Systems and Applications. Springer
Berlin/Heidelberg, 2007, ch. 1, 12.
[2] R. Kosala and H. Blockeel, "Web Mining Research: a survey," ACM SIGKDD Explorations Newsletter, vol. 2 Issue 1, June 2000.
[3] J. Srivastava, R. Cooley, M. Deshpande and P-T. Tan, "Web Usage ining: Discovery and applications of usage patterns from web data,"
SIGKDD Explorations, 2000, 1(2):12-23.
[4] J. Wang, Y. Huang, G. Wu and F. Zhang, "Web Mining: Knowledge
discovery on the Web," Systems, Man and Cybernetics, IEEE SMC '99
Conference Proceedings, (Tokyo, Japan, 1999), IEEE, vol. 2, 137-141.
[5] R. Agrawal and R. Srikant, "Mining sequential patterns," Proceedings
of the 11th International Conference on Data Engineering, (Taipei,
Taiwan, 1995), IEEE Computer Society Press, 3-14.
[6] R. Srikant and R. Agrawal, "Mining Sequential Patterns:
Generalizations and performance improvements," Proceedings of the
Fifth International Conference on Extending Database Technology,
(Avignon, France, 1996), Springer-Verlag, vol. 1057, 3-17.
[7] J. Pei, J. Han, B. Mortazavi-asl and H. Zhu, "Mining access patterns
efficiently from web logs," In Proceedings of the 4th Pacific-Asia
Conference on Knowledge Discovery and Data Mining, (Kyoto, Japan,
2000), Springer, 396-407.
[8] C. I. Ezeife and Y. Lu, "Mining web log sequential patterns with
position coded pre-order linked WAP-tree," International Journal of
Data Mining and Knowledge Discovery, 2005, 10, 5-38.
[9] W. Wang and P. T. Cao-Thai, "Novel position-coded methods for
mining web access patterns," IEEE International Conference on
Intelligence and Security Informatics, 2008, 194-196.
[10] X. Tan, M. Yao and J. Zhang, "Mining maximal frequent access
sequences based on improved WAP-tree," Proceedings of the Sixth
International Conference on Intelligent Systems Design and
Applications, IEEE Computer Society Press, 2006, vol. 1, 616-620.
[11] J. D. Parmar and S. Garg, "Modified web access pattern (mWAP)
approach for sequential pattern mining," INFOCOMP - Journal of
Computer Science, June, 2007, 6(2): 46-54.
[12] J. Lu, X. F. Wang, O. Adjei and F. Hussain, "Sequential patterns graph
and its construction algorithm," Chinese Journal of Computers, 2004,
27(6): 782-788.
[13] R. Agrawal, T. Imielinski and A. Swami, "Mining association rules
between sets of items in large databases," Proceedings of the 1993 ACM
SIGMOD, 207-216.
[14] J. Pei, J. W. Han, B. Mortazavi-Asl and H. Pinto, "PrefixSpan: Mining
sequential patterns efficiently by prefix-projected pattern growth,"
Proceedings of the 17th International Conference on Data Engineering,
(Heidelberg, Germany, 2001), IEEE Computer Society Press, 215-224.
[15] J. Lu, O. Adjei, W. R. Chen and J. Liu, "Post Sequential Patterns
Mining: A new method for discovering structural patterns," Proceedings
of the Second International Conference on Intelligent Information
Processing, (Beijing, China, 2004), Springer-Verlag, 239-250.
[16] J. Lu, W. R. Chen, O. Adjei and M. Keech, "Sequential patterns postprocessing
for structural relation patterns mining," International
Journal of Data Warehousing & Mining, 2008, 4(3): 71-89.
[17] J. Lu, W. R. Chen and M. Keech, "Graph-based modelling of concurrent
sequential patterns," International Journal of Data Warehousing &
Mining, to appear.
[18] P. Tang, and M. P. Turkia, "Mining frequent web access patterns with
partial enumeration," Proceedings of the 45th Annual Southeast
Regional Conference, (Winston-Salem, North Carolina, USA, 2007),
ACM, 226-231.
[19] R. Kohavi, C. Brodley, B. Frasca, L. Mason and Z. J. Zheng, "KDDCup
2000 Organizers' Report: Peeling the onion," SIGKDD
Explorations, vol. 2, Issue 2, 86-98, 2000.
[20] L. Getoor, "Link Mining: a new data mining challenge," SIGKDD
Explorations, vol. 4, Issue 2, 2003.