|
|
Improving Content-Oriented XML Retrieval by Applying Structural Patterns
Philipp Dopichaj
Fachbereich Informatik
Technische Universität Kaiserslautern
Gottlieb-Daimler-Straße
D-67663 Kaiserslautern
dopichaj@informatik.uni-kl.de
Abstract:
XML is the perfect format for storing (mostly) textual documents in a
knowledge management system; its flexibility enables users to store both
highly structured data and free text in the same document.
For knowledge management, it is important to be able to search
the free-text parts effectively; users need to find the
information that helps them solve their problem without having to
wade through much information that is not relevant for their problem.
Content-oriented XML retrieval addresses this challenge:
In contrast to traditional information retrieval, documents are not
considered atomic units, that is, elements such as
sections or paragraphs can be returned. One implication of this
is that results can overlap (for example a paragraph and the surrounding
section).
Although overlapping results are undesirable in the final retrieval
result as presented to the user, they can help to improve the quality
of the final result:
We take advantage of overlaps by applying patterns to small subtrees of
the retrieval result (result contexts); matching patterns adjust the
retrieval status values of the involved node in order to promote the
best results.
We demonstrate on the INEX 2005 test collection that this postprocessing
can lead to a significant improvement in retrieval quality.
Proc. ICEIS 2007, Funchal, June 2007.
|