Abstract: The various types of frequent pattern discovery
problem, namely, the frequent itemset, sequence and graph mining
problems are solved in different ways which are, however, in certain
aspects similar. The main approach of discovering such patterns can
be classified into two main classes, namely, in the class of the levelwise
methods and in that of the database projection-based methods.
The level-wise algorithms use in general clever indexing structures
for discovering the patterns. In this paper a new approach is proposed
for discovering frequent sequences and tree-like patterns efficiently
that is based on the level-wise issue. Because the level-wise
algorithms spend a lot of time for the subpattern testing problem, the
new approach introduces the idea of using automaton theory to solve
this problem.
Abstract: Mining frequent tree patterns have many useful
applications in XML mining, bioinformatics, network routing, etc.
Most of the frequent subtree mining algorithms (i.e. FREQT,
TreeMiner and CMTreeMiner) use anti-monotone property in the
phase of candidate subtree generation. However, none of these
algorithms have verified the correctness of this property in tree
structured data. In this research it is shown that anti-monotonicity
does not generally hold, when using weighed support in tree pattern
discovery. As a result, tree mining algorithms that are based on this
property would probably miss some of the valid frequent subtree
patterns in a collection of trees. In this paper, we investigate the
correctness of anti-monotone property for the problem of weighted
frequent subtree mining. In addition we propose W3-Miner, a new
algorithm for full extraction of frequent subtrees. The experimental
results confirm that W3-Miner finds some frequent subtrees that the
previously proposed algorithms are not able to discover.
Abstract: Generalized Center String (GCS) problem are
generalized from Common Approximate Substring problem
and Common substring problems. GCS are known to be
NP-hard allowing the problems lies in the explosion of
potential candidates. Finding longest center string without
concerning the sequence that may not contain any motifs is
not known in advance in any particular biological gene
process. GCS solved by frequent pattern-mining techniques
and known to be fixed parameter tractable based on the
fixed input sequence length and symbol set size. Efficient
method known as Bpriori algorithms can solve GCS with
reasonable time/space complexities. Bpriori 2 and Bpriori
3-2 algorithm are been proposed of any length and any
positions of all their instances in input sequences. In this
paper, we reduced the time/space complexity of Bpriori
algorithm by Constrained Based Frequent Pattern mining
(CBFP) technique which integrates the idea of Constraint
Based Mining and FP-tree mining. CBFP mining technique
solves the GCS problem works for all center string of any
length, but also for the positions of all their mutated copies
of input sequence. CBFP mining technique construct TRIE
like with FP tree to represent the mutated copies of center
string of any length, along with constraints to restraint
growth of the consensus tree. The complexity analysis for
Constrained Based FP mining technique and Bpriori
algorithm is done based on the worst case and average case
approach. Algorithm's correctness compared with the
Bpriori algorithm using artificial data is shown.