Abstract: The various types of frequent pattern discovery
problem, namely, the frequent itemset, sequence and graph mining
problems are solved in different ways which are, however, in certain
aspects similar. The main approach of discovering such patterns can
be classified into two main classes, namely, in the class of the levelwise
methods and in that of the database projection-based methods.
The level-wise algorithms use in general clever indexing structures
for discovering the patterns. In this paper a new approach is proposed
for discovering frequent sequences and tree-like patterns efficiently
that is based on the level-wise issue. Because the level-wise
algorithms spend a lot of time for the subpattern testing problem, the
new approach introduces the idea of using automaton theory to solve
this problem.
Abstract: The goal of data mining algorithms is to discover
useful information embedded in large databases. One of the most
important data mining problems is discovery of frequently occurring
patterns in sequential data. In a multidimensional sequence each
event depends on more than one dimension. The search space is quite
large and the serial algorithms are not scalable for very large
datasets. To address this, it is necessary to study scalable parallel
implementations of sequence mining algorithms.
In this paper, we present a model for multidimensional sequence
and describe a parallel algorithm based on data parallelism.
Simulation experiments show good load balancing and scalable and
acceptable speedup over different processors and problem sizes and
demonstrate that our approach can works efficiently in a real parallel
computing environment.