A Two-Step Approach for Tree-structured XPath Query Reduction

XML data consists of a very flexible tree-structure which makes it difficult to support the storing and retrieving of XML data. The node numbering scheme is one of the most popular approaches to store XML in relational databases. Together with the node numbering storage scheme, structural joins can be used to efficiently process the hierarchical relationships in XML. However, in order to process a tree-structured XPath query containing several hierarchical relationships and conditional sentences on XML data, many structural joins need to be carried out, which results in a high query execution cost. This paper introduces mechanisms to reduce the XPath queries including branch nodes into a much more efficient form with less numbers of structural joins. A two step approach is proposed. The first step merges duplicate nodes in the tree-structured query and the second step divides the query into sub-queries, shortens the paths and then merges the sub-queries back together. The proposed approach can highly contribute to the efficient execution of XML queries. Experimental results show that the proposed scheme can reduce the query execution cost by up to an order of magnitude of the original execution cost.




References:
[1] Quanzhong Li and Bongki Moon. Indexing and querying XML data for
regular path expressions. In Proc. of the 27th VLDB conference, Rome,
Italy, Sep. 2001.
[2] Chun Zhang, Jeffrey F. Naughton, Qiong Luo, and David J. DeWitt, and
Guy M. Lohman. On supporting containment queries in relational
database management systems. SIGMOD conference, Santa Barbara, CA,
USA, May 2001.
[3] Divesh Srivastava, Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas,
Jinesh M. Patel, and Yuqing Wu. Structural joins: A primitive for efficient
XML query pattern matching. IEEE conference on Data Engineering, San
Jose, USA, Feb. 2002.
[4] Shu-Yao Chien, Zografoula Vagena, Donghui Zhang, vassilis J. Tsotras,
and Carlo Zaniolo. Efficient structural joins on indexed XML documents.
28th VLDB conference, p.263-274, Hong Kong, China, Aug. 2002.
[5] Yuqing Wu, Jignesh M. Patel, and H. V. Jagadish, Structural Join Order
Selection for XML Query Optimization. IEEE conference on Data
Engineering, p. 443-454, Bangalore, India, March 2003.
[6] Nicolas Bruno, Nick Koudas, and Divesh Srivastava, Holistic twig joins:
optimal XML pattern matching, SIGMOD conference, p. 311-321,
Madison, Wisconsin, USA, Jun. 2002.
[7] Haifeng Jiang, Wei Wang, Hongjun Lu, and Jeffrey Xu Yu, Holistic Twig
Joins on Indexed XML Documents, 29th VLDB conference, p. 273-284,
Berlin, Germany, Sep. 2003.
[8] Mary Fernandez and Dan Suciu. Optimizing regular path expressions
using graph schemas. IEEE Conference on Data Engineering, p. 4-13,
Orlando, Florida, Feb. 1998.
[9] Hyoseop Shin, Minsoo Lee, An Efficient Branch Query Rewriting
Algorithm for XML Query Optimization, ODBASE 2005, LNCS 3761,
Springer-Verlag, p. 1629-1639, Agia Napa, Cyprus, October 31 -
November 4, 2005.
[10] Albrecht Schmidt, Florian Waas, Martin L. Kersten, Michael J. Carey,
Ioana Manolescu, Ralph Busse. XMark: A Benchmark for XML Data
Management. In Proc. of the 28th VLDB conference, p. 974-985, Hong
Kong, China, Aug. 2002.
[11] Sleepycat Software Inc., http://www.sleepycat.com.