Query Optimization Techniques for XML Databases

Over the past few years, XML (eXtensible Mark-up Language) has emerged as the standard for information representation and data exchange over the Internet. This paper provides a kick-start for new researches venturing in XML databases field. We survey the storage representation for XML document, review the XML query processing and optimization techniques with respect to the particular storage instance. Various optimization technologies have been developed to solve the query retrieval and updating problems. Towards the later year, most researchers proposed hybrid optimization techniques. Hybrid system opens the possibility of covering each technology-s weakness by its strengths. This paper reviews the advantages and limitations of optimization techniques.




References:
[1] S. Abiteboul et al, "The Lorel Query Language for Semistructed Data,
Journal of Digital Libraries", Vol 1, No 1, 1997, pp. 68-88.
[2] J. Robie, J. Lapp, D. Schach, XML Query Language (XQL). Available
http://www.w3.org/TandS/QL/QL98/pp/xql.html
[3] S. Ceri et al, XML-GL : A Graphical Language for Querying and
Reshaping XML Documents. Available
http://www.w3.org/TandS/QL/QL98/pp/xml-gl.html
[4] W3C, XML Path Language (XPath). Available
http://www.w3.org/TR/xpath-datamodel/
[5] W3C, XML Query (XQuery). Available
http://www.w3.org/XML/XQuery
[6] D. Zhang, Y. Dong, "A Data Model and Algebra for the Web",
Proceeding 10th International Workshop on Database and Expert System
Application, IEEE Computer Society, 1999, pp. 711-714.
[7] F. Frasincar, G. Houben, C. Pau, "XAL : An Algebra for XML Query
Optimization", 13th Australasian Database Conference, 2002, pp. 49-56.
[8] V. Christophides, S. Cluet, J. Simeon, "On Wrapping Query Languages
and Efficient XML Integration", ACM SIGMOD International
Conference on Management of Data, ACM Press, 2000, pp. 141-152.
[9] T. Bray, J. Paoli, C. Sperberg-McQueen, Extensible markup language
(XML) 1.0, Technical report, W3C Recommendation, 1998
[10] L. Shan, Y. Rao, "A Performance Evaluation of Storing XML Data in
Relational Database Management Systems", WIDM 2001, pp. 31-38.
[11] M. Atay, Y. Sun, D. Liu, S. Lu, F. Fotouhi, "Mapping XML Data To
Relational Data : A DOM-Based Approach", Proc. of the 8th IASTED
International Conference on Internet and Multimedia Systems and
Applications, 2004, pp. 59-64.
[12] T.S. Chung, H-J Kim, "Techniques for the evaluation of XML queries :
a survey", ACM Data And Knowledge Engineering 46, 2003, pp. 225-
246.
[13] M. Garafalakis, A. Gionis, R. Rastpgo, S. Seshadri, K. Shim, "XTRACT
: a system for extracting document type descriptors from XML
documents", Proceeding of the ACM SIGMOD Int. Conference on the
Management of Data, 2000, pp. 165-176.
[14] R. Bourret, XML Database Products. Available
http://www.rpbourret.com/xml/XMLDatabaseProds.htm
[15] J. McHugh & J. Widom, "Query Optimization for XML", Proceeding
25th International Conference on Very Large Databases, 1999, pp. 315-
326.
[16] R. Goldman & J. Widom, "Data Guides : enabling query formulation
and optimization in semistructured database", Proceeding of VLDB,
1997,pp. 436-445.
[17] T. Milo & D. Suciu, "Index structures for path expression", Proceeding
of 7th Int. Conference on Database Theory, 1999, pp. 277-295.
[18] R. Kaushik, D. Shenoy, P. Bohannon, E. Gudes, "Exploiting Local
Similarity to Efficiently Index Paths in Graph-Structured Data",
Proceeding of Int. Conference on Data Engineering, 2002, pp. 129-140.
[19] B. F. Cooper, N. Sample, M.J. Franklin, G.R. Hjaltason, M. Shadmon,
"A Fast Index for Semistructured Data", Proceeding of 27th VLDB
Conference, 2001, pp. 341-350.
[20] C.W. Chung, J.K. Min, K. Shim, "APEX : An Adaptive Path Index for
XML data", ACM SIGMOD, 2002, pp. 121-132.
[21] C. Zhang, J. Naughton, D. DeWitty, Q. Luo, G. Lohman, "On
Supporting Containment Queries in Relational Database Management
Systems", ACM SIGMOD, 2001, pp. 425-436.
[22] J. Kim & H-J Kim, "Efficient processing of regular path joins using
PID", Information and Software Technology 45, 2003, pp. 241-251.
[23] K. Michal, P. Jaroslav, S. Vaclav, "Indexing XML Data with UB trees",
ADBIS, 2002, pp. 155-164.
[24] D. Adam, Data Structures and Algorithms in C++, 2001, Thomson
Learning
[25] R. Agrawal, R. Srikant, "Mining sequential patterns", Proceeding of the
11th Int. Conference on Data Engineering, 1995, pp. 3-14.
[26] J. Lu & T.W. Ling, "Labeling and Querying Dynamic XML Trees",
APWeb, LNCS 3007, 2004, pp. 180-189.
[27] P.F. Dietz, "Maintaining order in a linked list", Proceeding of the 14th
Annual ACM Symposium on Theory of Computing, 1982, pp. 122-127.
[28] Q. Li & B. Moon, "Indexing and Querying XML Data for Regular Path
Expressions", Proceeding of 27th VLDB Conference, 2001, pp. 361-370.
[29] W.E Kimber, "HyTime and SGML : Understanding the HyTime HYQ
Query Language", Technical Report Version 1.1, IBM Corporation,
1993.
[30] E. Cohen, H. Kaplan, T. Milo, "Labeling Dynamic XML Trees",
Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium
on Principles of database systems, 1992, pp. 272-281.
[31] H. Kaplan, T. Milo, R. Shabo, "A Comparison of Labeling Schemes for
Ancestor Queries", Proceedings of the thirteenth annual ACM-SIAM
symposium on Discrete algorithms, 2002, pp. 954-963.
[32] X. Wu, M.L. Lee, W. Hsu, "A Prime Number Labeling Scheme for
Dynamic Ordered XML Trees", Proceedings of the 20th Int Conference
on Data Engineering, 2004.