Cost-Based XQuery Optimization

XTC provides an extensible, rule-based, and cost-based XML query optimization framework, that serves as a basic testbed for exploring how and whether established techniques of relational cost-based query optimization (e.g., reordering of join operators) can be reused and which new techniques have to be developed to make a significant contribution for accelerating query execution. Using the best practices and an appropriate cost model that will be developed using this framework, it can be turned into a stable cost-based XML query optimizer in the future.


Andreas M. Weiner

XQuery Evaluation in XTC

The XTC XQuery Optimization Process

The Figure on the left-hand side shows the different stages an XQuery statement traverses during cost-based query optimization. Initially, the XQuery expression is parsed and mapped to an Abstract Syntax Tree (AST). Next, normalization maps the AST to a canonical representation according to the formal XQuery semantics and, therefore, removes all syntactic sugar. Afterwards, static type checking allows for type inference. The simplification step removes redundant parts of the query and prepares it for a translation to the XML Query Graph Modell (XQGM)—an extended version of the seminal Query Graph Model (QGM), which is well-known from the the relational world. This representation serves as our logical XQuery algebra, which relies on nested tuples. Until the present day, our algebra supports a significant subset of the XQuery language, e.g, FLWOR expressions, path expressions, comparison and positional predicates, quantifications, and node-construction expressions.

Every XQGM instance is passed on to our query optimization framework, which selects the most promising plan based on our cost model. When the optimal plan is found, it is mapped onto a Query Execution Plan (QEP) that serves as input for the final stage, which runs the QEP and returns the final result to the user.







Visually Explaining Cost-Based XQuery Optimization

The query optimizer of XTC is accompanied by a visual explain tool that allows for a configuration of the query optimizer. Because our query optimizer is completely rule-based, every rewrite rule that can be applied by the plan generator, can be switched on and off—even during runtime. Furthermore, different search strategies (e.g., Dynamic Programming or Simulated Annealing) can be employed. To immediately see the impact of a larger or smaller search space (by activating or deactivating rewrite rules) and the optimization result gained using a specific search strategy, we provide a powerful visual explain tool (shown in the Illustration below).

This tool allows to track the complete optimization process from the beginning to its very end. You can see all logical query rewrites performed during the XQuery-to-Algebra mapping phase as well as the final Query Execution Plan (QEP) gained by cost-based optimization. After execution, the QEP is additionally annotated with runtime information, e.g., the total number of tuples consumed by each operator as well as the estimated and actual selectivity of predicates.


The XTC Visual Explain Tool

XQuery Optimization Example

Let's have a look how query optimization works for XMark query Q8:

let $auction := doc("auction.xml") return
for $p in $auction/site/people/person
let $a :=
for $t in $auction/site/closed_auctions/closed_auction
where $t/buyer/@person = $p/@id
return $t
return <item person="{$p/name/text()}">{count($a)}</item>


On the right-hand side, you can see the corresponding XQGM instance (by clicking on the image, you can get a full-screen view).
All XQGM operators are connected to tuple variables that have an FOR (F) or LET (L) quantification. The solid lines connecting QGM operators describe the data flow. The dotted line associated with the top-most operator's LET-quantified tuple variable describes a context-dependent evaluation of the left subtree. Each path step is evaluated using a logical Structural Join operator using the structural relationshop (e.g., child or descendant) as predicate.All Structural Join operators consume tuple streams provided by the various access operators.

After cost-based query optimization, we get a Query Execution Plan (QEP), which is shown below. Almost all structural relationships are now evaluated using path indexes serving as a kind of materialzed views on specific paths in the document. The remaining path steps are evaluated using physical Structural Joins (we actually implemented the famous StackTree operator). Due to the DeweyID node labeling scheme, our physical Structural Join operator can evaluate all XPath axes, instead of only child and descendant.

Here, the optimizer has chosen to use element index accesses to get all "buyer" and "name" element nodes instead of costly scanning the complete document. By using the content index for getting all content-node values,  expensive accesses to the document are not necessary anymore.





Andreas M. Weiner
Anfrageverarbeitung in nativen XML-Datenbankverwaltungssystemen (Query Processing in Native XML Database Management Systems)
In: Proc. Informatiktage, Bonn, Germany, pp. 201-204
March 2008
Andreas M. Weiner, Christian Mathis and Theo Härder
Rules for Query Rewrite in Native XML Databases
In: Proc. EDBT Workshop "Database Technologies for Handling XML Information on the Web (DataX)", ACM International Conference Proceeding Series 261, Nantes, France, pp. 21-26
March 2008
Andreas M. Weiner, Christian Mathis and Theo Härder
Associativity Rules for Native XML Databases
Internal Report
March 2008


Andreas M. Weiner
Plangenerierung zur Anfrageauswertung in nativen XML-Datenbankverwaltungssystemen
Diploma Thesis, University of Kaiserslautern, July 2007
Previous | 1, 2 | Next
Export as: