Mining Sequential Patterns Using I-PrefixSpan

In this paper, we propose an improvement of pattern growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree framework and separator database to reduce the execution time and memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.




References:
[1] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Journal Intelligent Systems, vol. 9, No.1, 1997, pp. 33 - 56.
[2] R. Agrawal and R. Srikant, "Mining Sequential Patterns: Generalization and Performance Improvements," Research Report RJ 9994, IBM Almaden Research Center, San Jose, California, December 1995.
[3] J. Han, and M. Kamber, Data Mining: Concepts and Techniques. CA:
Prentice Hall, 2002, ch. 5.
[4] J. Pei, et al, "Mining Sequential Patterns by Pattern Growth: The
PrefixSpan Approach," IEEE Transaction on Knowledge and Data
Engineering, vol. 16, no. 11, pp.14240-1440, Nov. 2004.
[5] J. Han and J. Pei, "Mining Frequent Patterns by Pattern-Growth:
Methodology and Implications", ACM SIGKDD Explorations (Special
Issue on Scalable Data Mining Algorithms), vol. 2, no. 2, pp.14-20,
December 2000.
[6] M. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent
Sequences," Journal Machine Learning, vol. 42, nos. 1-2, 2001.
[7] L.M. Yen and L. S.Y. Lee, "Fast discovery of sequential patterns
through memory indexing and database partitioning," Journal
Information Science and Engineering, vol. 21, pp. 109-128, 2005.
[8] J. Shirazi, Java™ Performance Tuning. CA:O-Reilly, 2003.