W3-Miner: Mining Weighted Frequent Subtree Patterns in a Collection of Trees

Mining frequent tree patterns have many useful applications in XML mining, bioinformatics, network routing, etc. Most of the frequent subtree mining algorithms (i.e. FREQT, TreeMiner and CMTreeMiner) use anti-monotone property in the phase of candidate subtree generation. However, none of these algorithms have verified the correctness of this property in tree structured data. In this research it is shown that anti-monotonicity does not generally hold, when using weighed support in tree pattern discovery. As a result, tree mining algorithms that are based on this property would probably miss some of the valid frequent subtree patterns in a collection of trees. In this paper, we investigate the correctness of anti-monotone property for the problem of weighted frequent subtree mining. In addition we propose W3-Miner, a new algorithm for full extraction of frequent subtrees. The experimental results confirm that W3-Miner finds some frequent subtrees that the previously proposed algorithms are not able to discover.




References:
[1] Y. Chi, S. Nijssen, R.R. Muntz, J. N. Kok, "Frequent Subtree Mining An
Overview," Fundamental Informatics, Special Issue on Graph and Tree
Mining, 2005.
[2] M.J. Zaki, "Efficiently Mining Frequent Trees in a Forest: Algorithms
and Applications," in IEEE Transaction on Knowledge and Data
Engineering, vol. 17, no. 8, pp. 1021-1035, 2005.
[3] H. Tan, T.S. Dillon, L. Feng, E. Chang, F. Hadzic, "X3-Miner: Mining
Patterns from XML Database," In Proc. Data Mining '05. Skiathos,
Greece, 2005.
[4] K. Abe, S. Kawasoe, T. Asai, H. Arimura, and S. Arikawa, "Optimized
Substructure Discovery for Semi-structured Data," In Proc. PKDD-02,
1-14, LNAI 2431, 2002.
[5] M. J Zaki,.. Efficient Mining of Trees in the Forest. SIGKDD '02,
Edmonton, Alberta, Canada, ACM. 2002.
[6] M. J. Zaki and C. C. Aggarwal. XRules: An effective structural classifier
for XML data. In Proc. of the 2003 Int. Conf. Knowledge Discovery and
Data Mining, 2003.
[7] T. Asai, H. Arimura, T. Uno, and S. Nakano. Discovering frequent
substructures in large unordered trees. In Proc. of the 6th Intl. Conf. on
Discovery Science, 2003.
[8] Y. Chi, Y. Yang, and R. R. Muntz. Mining frequent rooted trees and free
trees using canonical forms. Technical Report CSD-TR No. 030043,
UCLA, 2003.
[9] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri
Verkamo, "Fast Discovery of Association Rules," Advances in
Knowledge Discovery, and Data Mining, U. Fayyad et al., eds.,pp. 307-
328, Menlo Park, Calif.: AAAI Press, 1996.
[10] M.J. Zaki, "Fast Vertical Mining Using Diffsets," In. Proc. of Int. Conf.
Knowledge Discovery and Data Mining (SIGKDD-03), 2003.
[11] M. Zaki. Efficiently mining frequent embedded unordered trees.
Fundamental Informatics, 65:1-20, 2005.
[12] B. Shapiro and K. Zhang, "Comparing Multiple RNA Secondary
Structures Using Tree Comparisons," Computer Applications in
Biosciences, vol. 6, no. 4, pp. 309-318, 1990.