Automata Theory Approach for Solving Frequent Pattern Discovery Problems

The various types of frequent pattern discovery problem, namely, the frequent itemset, sequence and graph mining problems are solved in different ways which are, however, in certain aspects similar. The main approach of discovering such patterns can be classified into two main classes, namely, in the class of the levelwise methods and in that of the database projection-based methods. The level-wise algorithms use in general clever indexing structures for discovering the patterns. In this paper a new approach is proposed for discovering frequent sequences and tree-like patterns efficiently that is based on the level-wise issue. Because the level-wise algorithms spend a lot of time for the subpattern testing problem, the new approach introduces the idea of using automaton theory to solve this problem.




References:
[1] R. Agrawal and R. Srikant, "Mining sequential patterns," in ICDE -95:
Proceedings of the Eleventh International Conference on Data
Engineering. Washington, DC, USA: IEEE Computer Society, 1995, pp.
3-14.
[2] R. Srikant and R. Agrawal, "Mining sequential patterns: Generalizations
and performance improvements," in Proc. of the 5th International
Conference on Extending Database Technology, 1996.
[3] M. N. Garofalakis, R. Rastogi, and K. Shim, "Spirit: Sequential pattern
mining with regular expression constraints." in VLDB, M. P. Atkinson,
M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, Eds.
Morgan Kaufmann, 1999, pp. 223-234.
[4] M. J. Zaki, "Spade: An efficient algorithm for mining frequent
sequences," Machine Learning, vol. 42, no. 1-2, pp. 31-60, 2001.
[5] J. Han, J. Pei, and B. M.-A. et al., "Freespan: frequent pattern-projected
sequential pattern mining," in KDD -00: Proceedings of the sixth ACM
SIGKDD international conference on Knowledge discovery and data
mining. New York, NY, USA: ACM Press, 2000, pp. 355-359.
[6] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C.
Hsu, "PrefixSpan mining sequential patterns efficiently by prefix
projected pattern growth," in In Proc. of Int. Conf. on Data Engineering,
2001, pp. 215-226.
[7] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, "Sequential pattern mining
using a bitmap representation," in In Proc. of the 8th ACM SIGKDD
Int.Conf. on Knowledge Discovery and Data Mining, 2002, pp. 429-
435.
[8] M. Zaki, "Efficiently mining frequent trees in a forest," in Proc. of the
8th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, July 2002, pp. 71-80.
[9] T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa,
"Efficient substructure discovery from large semi-structured data." In
SDM, R. L. Grossman, J. Han, V. Kumar, H. Mannila, and R. Motwani,
Eds. SIAM, 2002.
[10] U. Rckert and S. Kramer, "Frequent free tree discovery in graph data," in
SAC -04: Proceedings of the 2004 ACM symposium on Applied
computing. New York, NY, USA: ACM Press, 2004, pp. 564-570.