A New Model for Discovering XML Association Rules from XML Documents

The inherent flexibilities of XML in both structure and semantics makes mining from XML data a complex task with more challenges compared to traditional association rule mining in relational databases. In this paper, we propose a new model for the effective extraction of generalized association rules form a XML document collection. We directly use frequent subtree mining techniques in the discovery process and do not ignore the tree structure of data in the final rules. The frequent subtrees based on the user provided support are split to complement subtrees to form the rules. We explain our model within multi-steps from data preparation to rule generation.




References:
[1] Braga D., A. Campi, M. Klemettinen, and P. L. Lanzi. Mining
association rules from XML data. In Proceedings of the 4th International
Conference on Data Warehousing and Knowledge Discovery, September
4-6, Aixen-Provence, France 2002.
[2] Feng L. & T. Dillon. Mining XML-Enabled Association Rule with
Templates. In Proceedings of KDID04, 2004.
[3] Nayak, R. Discovering Knowledge from XML Documents, in Wong,
John, Eds. Encyclopedia of Data Warehousing and Mining. Idea Group
Publications, 2005.
[4] Tan, H., T.S. Dillon, L. Feng, E. Chang, F. Hadzic, "X3-Miner: Mining
Patterns from XML Database," In Proc. Data Mining '05. Skiathos,
Greece, 2005.
[5] M. .J. Zaki, "Efficiently Mining Frequent Trees in a Forest: Algorithms
and Applications," in IEEE Transaction on Knowledge and Data
Engineering, vol. 17, no. 8, pp. 1021-1035, 2005.
[6] M. J. Zaki,.. "Efficient Mining of Trees in the Forest". SIGKDD '02,
Edmonton, Alberta, Canada, ACM. 2002.
[7] Y. Chi, S. Nijssen, R.R. Muntz, J. N. Kok, "Frequent Subtree Mining An
Overview," Fundamental Informatics, Special Issue on Graph and Tree
Mining, 2005.
[8] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri
Verkamo, "Fast Discovery of Association Rules," Advances in
Knowledge Discovery, and Data Mining, U. Fayyad et al., eds.,pp. 307-
328, Menlo Park, Calif.: AAAI Press, 1996.
[9] R. AliMohammadzadeh, M. Haghir Chehreghani, A. Zarnani, M.
Rahgozar, "W3-Miner: Mining Weighted Frequent Subtree Patterns in a
Collection of Trees". In Proceedings of the Second International
Conference on Pattern Analysis (Budapest, Hungary, May 26-28, 2006).
ICPA-06. Transaction on Engineering, Computing and Technology,
ISSN 1305-5313, Pages 164-168, World Enformatika Society.
[10] M. Zaki. Efficiently mining frequent embedded unordered trees.
Fundamental Informatics, 65:1-20, 2005.
[11] M. J. Zaki and C. C. Aggarwal. XRules: An effective structural classifier
for XML data. In Proc. of the 2003 Int. Conf. Knowledge Discovery and
Data Mining, 2003.
[12] K. Abe, S. Kawasoe, T. Asai, H. Arimura, and S. Arikawa, "Optimized
Substructure Discovery for Semi-structured Data," In Proc. PKDD-02,
1-14, LNAI 2431, 2002.
[13] T. Asai, H. Arimura, T. Uno, and S. Nakano. Discovering frequent
substructures in large unordered trees. In Proc. of the 6th Intl. Conf. on
Discovery Science, 2003.
[14] Y. Chi, Y. Yang, and R. R. Muntz. Mining frequent rooted trees and free
trees using canonical forms. Technical Report CSD-TR No. 030043,
UCLA, 2003.
[15] H. Tan, T.S. Dillon, L. Feng, E. Chang, F. Hadzic, "X3-Miner: Mining
Patterns from XML Database," In Proc. Data Mining '05. Skiathos,
Greece, 2005.
[16] K. Wang and H. Liu, "Discovering Typical Structures of Documents: A
Road Map Approach," Proc. ACM SIGIR Conf. Information Retrieval,
1998.
[17] Y. Chi, Y. Yang, and R.R. Muntz, "Indexing and Mining Free Trees,"
Proc. Third IEEE Int-l Conf. Data Mining, 2003.
[18] U. Ruckert and S. Kramer, "Frequent Free Tree Discovery in Graph
Data," Special Track on Data Mining, Proc. ACM Symp. Applied
Computing, 2004.
[19] Y. Xiao, J.-F. Yao, Z. Li, and M.H. Dunham, "Efficient Data Mining for
Maximal Frequent Subtrees," Proc. Int-l Conf. Data Mining, 2003.
[20] S. Nijssen and J.N. Kok, "Efficient Discovery of Frequent Unordered
Trees," Proc. First Int-l Workshop Mining Graphs, Trees, and
Sequences, 2003.
[21] Y. Chi, Y. Yang, and R.R. Muntz, "HybridTreeMiner: An Efficient
Algorihtm for Mining Frequent Rooted Trees and Free Trees Using
Canonical Forms," Proc. 16th Int-l Conf. Scientific and Statistical
Database Management, 2004.
[22] A. Termier, M-C. Rousset, and M. Sebag, "Treefinder: A First Step
Towards XML Data Mining," Proc. IEEE Int-l Conf. Data Mining, 2002.
[23] D. Shasha, J. Wang, and S. Zhang, "Unordered Tree Mining with
Applications to Phylogeny," Proc. Int-l Conf. Data Eng., 2004.
[24] C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi, "Efficient
Pattern-Growth Methods for Frequent Tree Pattern Mining," Proc.
Pacific-Asia Conf. Knowledge Discovery and Data Mining, 2004.
[25] R. AliMohammadzadeh, S. Soltan, and M. Rahgozar, "Template guided
association rule mining from XML documents". In Proceedings of the
15th international Conference on World Wide Web (Edinburgh,
Scotland, May 23 - 26, 2006). WWW 2006, ACM Press, New York,
NY, 963-964. DOI= http://doi.acm.org/10.1145/1135777.1135966.
[26] Q Ding, K Ricords, J Lumpkin, "Deriving General Association Rules
from XML Data", In Proceedings of Fourth ACIS International
Conference on Software Engineering, Artificial Intelligence,
Networking, and Parallel/Distributed Computing (SNPD'03) October 16-
18, 2003 L├╝beck, Germany.
[27] YL Chen, CH Ye, SY Wu, "Mining Predecessor-Successor Rules from
DAG Data", International Journal of Intelligent Systems, 2006.
[28] C. Combi, B. Oliboni, R. Rossato. "Complex Association Rules for
XML Documents". In Proceedings of the 9th International Conference
on Knowledge-Based Intelligent Information & Engineering Systems
(KES05).