Lehrgebiet Informationssysteme

FB Informatik

LG IS

AG DBIS

(C) AG DBIS

Hash-Based Structural Join Algorithms

Christian Mathis

Kaiserslautern University of Technology
Dept. of Computer Science (AG DBIS)
P.O. Box 3049, 67653 Kaiserslautern, Germany
e-mail: mathis@informatik.uni-kl.de

Theo Härder

Kaiserslautern University of Technology
Dept. of Computer Science (AG DBIS)
P.O. Box 3049, 67653 Kaiserslautern, Germany
e-mail: haerder@informatik.uni-kl.de

Full paper (PDF version)

Abstract

Algorithms for processing Structural Joins belong to the set of essential building blocks for XML query evaluation. Their design is a difficult task, because they have to satisfy many requirements, e.g., guarantee a linear worst-case runtime; generate a sorted, duplicate-free output; adapt to fiercely varying input sizes and element distributions; enable pipelining; and (probably) more. Therefore, it is not possible to design "the" structural join algorithm. Rather, the provision of different specialized operators, from which the query optimizer can choose, is beneficial for query efficiency. We propose new hash-based structural join algorithms that can process unordered input sequences possibly containing duplicates. We also show that these algorithms can substantially reduce the number of required sort operations on intermediate results for (complex) tree structured queries (twigs).

appears in Proc. of DATAX'06 Workshop, Munich, March 2006.