Cost-Based XQuery Optimization

XTC provides an extensible, rule-based, and cost-based XML query optimization framework, that serves as a basic testbed for exploring how and whether established techniques of relational cost-based query optimization (e.g., reordering of join operators) can be reused and which new techniques have to be developed to make a significant contribution for accelerating query execution. Using the best practices and an appropriate cost model that will be developed using this framework, it can be turned into a stable cost-based XML query optimizer in the future.


Andreas M. Weiner

XQuery Evaluation in XTC

The XTC XQuery Optimization Process

The Figure on the left-hand side shows the different stages an XQuery statement traverses during cost-based query optimization. Initially, the XQuery expression is parsed and mapped to an Abstract Syntax Tree (AST). Next, normalization maps the AST to a canonical representation according to the formal XQuery semantics and, therefore, removes all syntactic sugar. Afterwards, static type checking allows for type inference. The simplification step removes redundant parts of the query and prepares it for a translation to the XML Query Graph Modell (XQGM)—an extended version of the seminal Query Graph Model (QGM), which is well-known from the the relational world. This representation serves as our logical XQuery algebra, which relies on nested tuples. Until the present day, our algebra supports a significant subset of the XQuery language, e.g, FLWOR expressions, path expressions, comparison and positional predicates, quantifications, and node-construction expressions.

Every XQGM instance is passed on to our query optimization framework, which selects the most promising plan based on our cost model. When the optimal plan is found, it is mapped onto a Query Execution Plan (QEP) that serves as input for the final stage, which runs the QEP and returns the final result to the user.







Visually Explaining Cost-Based XQuery Optimization

The query optimizer of XTC is accompanied by a visual explain tool that allows for a configuration of the query optimizer. Because our query optimizer is completely rule-based, every rewrite rule that can be applied by the plan generator, can be switched on and off—even during runtime. Furthermore, different search strategies (e.g., Dynamic Programming or Simulated Annealing) can be employed. To immediately see the impact of a larger or smaller search space (by activating or deactivating rewrite rules) and the optimization result gained using a specific search strategy, we provide a powerful visual explain tool (shown in the Illustration below).

This tool allows to track the complete optimization process from the beginning to its very end. You can see all logical query rewrites performed during the XQuery-to-Algebra mapping phase as well as the final Query Execution Plan (QEP) gained by cost-based optimization. After execution, the QEP is additionally annotated with runtime information, e.g., the total number of tuples consumed by each operator as well as the estimated and actual selectivity of predicates.


The XTC Visual Explain Tool

XQuery Optimization Example

Let's have a look how query optimization works for XMark query Q8:

let $auction := doc("auction.xml") return
for $p in $auction/site/people/person
let $a :=
for $t in $auction/site/closed_auctions/closed_auction
where $t/buyer/@person = $p/@id
return $t
return <item person="{$p/name/text()}">{count($a)}</item>


On the right-hand side, you can see the corresponding XQGM instance (by clicking on the image, you can get a full-screen view).
All XQGM operators are connected to tuple variables that have an FOR (F) or LET (L) quantification. The solid lines connecting QGM operators describe the data flow. The dotted line associated with the top-most operator's LET-quantified tuple variable describes a context-dependent evaluation of the left subtree. Each path step is evaluated using a logical Structural Join operator using the structural relationshop (e.g., child or descendant) as predicate.All Structural Join operators consume tuple streams provided by the various access operators.

After cost-based query optimization, we get a Query Execution Plan (QEP), which is shown below. Almost all structural relationships are now evaluated using path indexes serving as a kind of materialzed views on specific paths in the document. The remaining path steps are evaluated using physical Structural Joins (we actually implemented the famous StackTree operator). Due to the DeweyID node labeling scheme, our physical Structural Join operator can evaluate all XPath axes, instead of only child and descendant.

Here, the optimizer has chosen to use element index accesses to get all "buyer" and "name" element nodes instead of costly scanning the complete document. By using the content index for getting all content-node values,  expensive accesses to the document are not necessary anymore.





Sebastian Bächle, Theo Härder, Volker Höfner, Joachim Klein, Yi Ou, Steffen Reithermann, Daniel Schall, Karsten Schmidt and Andreas M. Weiner
"Von der Torfabrik zur Denkfabrik" Bericht zur 14. Fachtagung "Datenbanksysteme für Business, Technologie und Web"
Datenbank-Spektrum, 11(2), pp. 135—140
Springer, August 2011
Andreas M. Weiner
Cost-Based XQuery Optimization in Native XML Database Systems
Ph.D. Thesis, University of Kaiserslautern, Verlag Dr. Hut, München July 2011
ISBN: 978-3-8439-0071-3
Andreas M. Weiner
Advanced Cardinality Estimation in the XML Query Graph Model
In: Proc. 14. GI-Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW 2011), LNI P - 180, Kaiserslautern, Germany, pp. 207-226
March 2011


Andreas M. Weiner and Theo Härder
An Integrative Approach to Query Optimization in Native XML Database Management Systems
In: Proc. IDEAS, Montreal, QC, CA
August 2010
Andreas M. Weiner, Theo Härder and Renato Oliveira da Silva
Visualizing Cost-Based XQuery Optimization
In: Proc. ICDE, Long Beach, California, USA, pp. 1165-1168
IEEE, March 2010
ISBN: 978-1-4244-5444-0
Andreas M. Weiner and Theo Härder
Advanced Applications and Structures in XML Processing: Label Streams, Semantics Utilization, and Data Query Technologies
A Framework for Cost-Based Query Optimization in Native XML Database Management Systems, pp. 160-182
C. Li and T. W. Ling (Ed.),
IGI Global, January 2010
ISBN: 1-61520-727-9
Theo Härder, Christian Mathis, Sebastian Bächle, Karsten Schmidt and Andreas M. Weiner
Essential Performance Drivers in Native XML DBMSs (keynote paper)
In: Proc. Current Trends in Theory and Practice of Computer Science (Sofsem 2010), LNCS 5901, Špindlerův Mlýn, Czech Republic, pp. 29-46
Springer, January 2010


Andreas M. Weiner and Theo Härder
Using Structural Joins and Holistic Twig Joins for Native XML Query Optimization
In: Proc. ADBIS, LNCS 5739, Riga, Latvia, pp. 149-163
September 2009
Andreas M. Weiner
Framework-Based Development and Evaluation of Cost-Based Native XML Query Optimization Techniques
In: Proc. VLDB Ph.D. Workshop, Lyon, France
August 2009
Andreas M. Weiner, Christian Mathis, Theo Härder and Caesar Ralf Franz Hoppen
Now it’s Obvious to The Eye—Visually Explaining XQuery Evaluation in a Native XML Database Management System
In: Proc. BTW, LNI P-144, Münster, Germany, pp. 616-619
March 2009


Christian Mathis, Andreas M. Weiner, Theo Härder and Caesar Ralf Franz Hoppen
XTCcmp, XQuery Compilation on XTC
PVLDB, 1(2), pp. 1400-1403
August 2008
Andreas M. Weiner, Christian Mathis and Theo Härder
Towards Cost-based Query Optimization in Native XML Database Management Systems
In: Proc. SYRCoDIS, CEUR Workshop Proceedings, St. Petersburg, Russia, pp. 15-26
May 2008
Previous | 1, 2 | Next
Export as: