In this paper, we propose an improvement of pattern
growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree
framework and separator database to reduce the execution time and
memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result
shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.
[1] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Journal Intelligent Systems, vol. 9, No.1, 1997, pp. 33 - 56.
[2] R. Agrawal and R. Srikant, "Mining Sequential Patterns: Generalization and Performance Improvements," Research Report RJ 9994, IBM Almaden Research Center, San Jose, California, December 1995.
[3] J. Han, and M. Kamber, Data Mining: Concepts and Techniques. CA:
Prentice Hall, 2002, ch. 5.
[4] J. Pei, et al, "Mining Sequential Patterns by Pattern Growth: The
PrefixSpan Approach," IEEE Transaction on Knowledge and Data
Engineering, vol. 16, no. 11, pp.14240-1440, Nov. 2004.
[5] J. Han and J. Pei, "Mining Frequent Patterns by Pattern-Growth:
Methodology and Implications", ACM SIGKDD Explorations (Special
Issue on Scalable Data Mining Algorithms), vol. 2, no. 2, pp.14-20,
December 2000.
[6] M. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent
Sequences," Journal Machine Learning, vol. 42, nos. 1-2, 2001.
[7] L.M. Yen and L. S.Y. Lee, "Fast discovery of sequential patterns
through memory indexing and database partitioning," Journal
Information Science and Engineering, vol. 21, pp. 109-128, 2005.
[8] J. Shirazi, Java™ Performance Tuning. CA:O-Reilly, 2003.
[1] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Journal Intelligent Systems, vol. 9, No.1, 1997, pp. 33 - 56.
[2] R. Agrawal and R. Srikant, "Mining Sequential Patterns: Generalization and Performance Improvements," Research Report RJ 9994, IBM Almaden Research Center, San Jose, California, December 1995.
[3] J. Han, and M. Kamber, Data Mining: Concepts and Techniques. CA:
Prentice Hall, 2002, ch. 5.
[4] J. Pei, et al, "Mining Sequential Patterns by Pattern Growth: The
PrefixSpan Approach," IEEE Transaction on Knowledge and Data
Engineering, vol. 16, no. 11, pp.14240-1440, Nov. 2004.
[5] J. Han and J. Pei, "Mining Frequent Patterns by Pattern-Growth:
Methodology and Implications", ACM SIGKDD Explorations (Special
Issue on Scalable Data Mining Algorithms), vol. 2, no. 2, pp.14-20,
December 2000.
[6] M. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent
Sequences," Journal Machine Learning, vol. 42, nos. 1-2, 2001.
[7] L.M. Yen and L. S.Y. Lee, "Fast discovery of sequential patterns
through memory indexing and database partitioning," Journal
Information Science and Engineering, vol. 21, pp. 109-128, 2005.
[8] J. Shirazi, Java™ Performance Tuning. CA:O-Reilly, 2003.
@article{"International Journal of Information, Control and Computer Sciences:63876", author = "Dhany Saputra and Dayang R. A. Rambli and Oi Mean Foong", title = "Mining Sequential Patterns Using I-PrefixSpan", abstract = "In this paper, we propose an improvement of pattern
growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree
framework and separator database to reduce the execution time and
memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result
shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.", keywords = "ArrayList, ArrayIntList, minimum support,sequence database, sequential patterns.", volume = "1", number = "11", pages = "3715-6", }