Revised PLWAP Tree with Non-frequent Items for Mining Sequential Pattern

Sequential pattern mining is a challenging task in data mining area with large applications. One among those applications is mining patterns from weblog. Recent times, weblog is highly dynamic and some of them may become absolute over time. In addition, users may frequently change the threshold value during the data mining process until acquiring required output or mining interesting rules. Some of the recently proposed algorithms for mining weblog, build the tree with two scans and always consume large time and space. In this paper, we build Revised PLWAP with Non-frequent Items (RePLNI-tree) with single scan for all items. While mining sequential patterns, the links related to the nonfrequent items are not considered. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated transactions. The algorithm supports both incremental and interactive mining. It is not required to re-compute the patterns each time, while weblog is updated or minimum support changed. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one. For evaluation purpose, we have used the benchmark weblog dataset and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.





References:
[1] R. Agrawal and R. Srikant, "Mining sequential patterns," In:
Proceedings of the 11th Int-l conference on data engineering, Taipei,
1995, pp 3-14.
[2] H. Cheung, X. Yan and J. Han, "IncSpan: incremental mining of
sequential patterns," In: Proceedings of the ACM SIGKDD international
conference on knowledge discovery and data mining, Seattle, 2004, pp.
527-532.
[3] C.I. Ezeife, Yi Lu and Yi Liu, "PLWAP sequential mining: open source
code," In: Proceedings of the open source data mining workshop on
frequent pattern mining implementations, in conjunction with ACM
SIGKDD, Chicago, August 21-24, 2005, pp 26-29.
[4] C.I. Ezeife and Yi Liu, "Fast incremental mining of web sequential
patterns with PLWAP tree," Int J Data Mining Knowledge Discovery,
Springer Science Publisher, vol. 19, 2009, pp 376-416.
[5] B. Kao, M. Zhang, C-LYi and D.W Cheung, "Efficient algorithms for
mining and incremental update of maximal frequent sequences," Int J
Data Mining Knowledge Discovery, Springer Science Publisher, vol. 10,
2005, pp 87-116.
[6] F. Masseglia, P. Poncelet and R. Cicchetti, "An efficient algorithm for
web usage mining," Netw Inform Syst Journal, vol. 2(5-6), 1999, pp
571-603.
[7] A. Nanopoulos and Y. Manolopoulos, "Mining patterns from graph
traversals," Data Knowledge Engineering, vol. 37(3), 2001, pp 243-266.
[8] S. Nguyen, X. Sun and M. Orlowska, "Improvements of incSpan:
incremental mining of sequential patterns in large database," In:
Proceedings 2000 Pacific-Asia conference on knowledge discovery and
data mining (PAKDD-05), 2005, pp 442-451.
[9] S. Parthasarathy, M.J Zaki, M. Ogihara and S. Dwarkadas, "Incremental
and interactive sequence mining," In: Proceedings of the 8th
international conference on information and knowledge management
(CIKM99), Kansas City, pp 251- 258.
[10] J. Pei, J. Han, B. Mortazavi-Asl and H. Pinto, "PrefixSpan: mining
sequential patterns efficiently by prefix projected pattern growth. In: The
proceedings of the 2001 international conference on data engineering
(ICDE -01), pp 215-224.
[11] J. Pei, J. Han, B. Mortazavi-asl and H. Zhu, "Mining access patterns
efficiently from web logs," In: proceedings 2000 Pacific-Asia
conference on knowledge discovery and data mining (PAKDD-00),
2000, Kyoto, pp 396-407.
[12] M. Spiliopoulou, "The laborious way from data mining to webmining,"
Journal Computer System Science Eng, Special Issue Semant Web
,vol.14, 1999, pp 113-126.
[13] R. Srikant and R. Agrawal, "Mining generalized association rules," In:
Proceedings of the 21st int-l conference on very large databases
(VLDB), Zurich,1995.
[14] R. Vishnu Priya, A.Vadivel and R.S. Thakur, "Frequent Pattern Mining
Using Modified CP-Tree for Knowledge Discovery," In the proceedings
of international conference ADMA-10, Part I, LNCS 6440, 2010, pp.
254-261.
[15] K. Wang, "Discovering patterns from large and dynamic sequential
data," J Intell Information System, vol. 9(1), 1997, pp 33-56
[16] M.J Zaki, "SPADE: an efficient algorithm for mining frequent
sequences," Machine Learning, vol.42, 2000, pp 31-60.
[17] M. Zhang, B. Kao, D. Cheung and C-L.Yip, "Efficient algorithms for
incremental update of frequent sequences," In: Proceedings of the sixth
Pacific-Asia conference on knowledge discovery and data mining
(PAKDD), 2002, pp 186-197.