XML Retrieval


XML is a popular format for storing all kinds of data – from database-like records to textual documents. Of particular interest to us are large collections of long texts, for example in digital libraries [Dop06]. When thousands of books are available in electronic form, good search support can be an important feature that provides significant advantages over (paper-based) traditional libraries. As witnessed by web search engines, it is feasible to search vast collections of text in reasonable time.

Compared to web search engines, however, users of digital libraries can (and should) have higher expectations: A book is simply too long to be a suitable retrieval result – even if the user knows that the information he is looking for is somewhere in that 300-page book, he still has to find the most relevant passage in the book. Obviously, this task should be delegated to the search engine as far as possible, and the semistructured XML format supports this well.

The aim of our project is to develop an XML retrieval engine that not only finds the most relevant documents, but also the most relevant parts in these documents. If a single section satisfies the user's information need, the section should be returned, and not the complete book.


Philipp Dopichaj


The Initiative for the Evaluation of XML Retrieval provides a testbed for the evaluation of the effectiveness of XML retrieval methods. We participated and submitted retrieval runs in 2005, 2006, and 2007. Furthermore, we provided relevance assessments in 2004.

