UniKL Logo

Lehrgebiet Informationssysteme

FB Informatik

FB Informatik
 
LG IS
AG DBIS
AG HIS
Jobs / Tasks
Courses
Publications
Contact
Misc
Impressum
(C) AG DBIS
 

Conflation Methods and Spelling Mistakes - A Sensitivity Analysis in Information Retrieval


Philipp Dopichaj, Theo Härder

University of Kaiserslautern
P.O. Box 3049, 67653 Kaiserslautern, Germany
e-mail: {dopichaj, haerder}@informatik.uni-kl.de

Full paper (PDF version)


Abstract:

In some information retrieval scenarios, for example internal help desk systems, texts are entered into the document collection without proofreading. This can result in a relatively high number of spelling mistakes, which can skew the order of the documents retrieved for a query or even prevent the retrieval of relevant documents. We focus on addressing this problem at the conflation stage of the retrieval process and evaluate whether conflation based on n-grams, which is said to be insensitive to misspellings, leads to better retrieval quality than commonly used stemming algorithms. We do this by performing tests on artificially corrupted test collections and examine which characteristics of the queries and the relevant documents influence the relative retrieval quality achieved using the different conflation methods.


Proc. 16. Workshop über Grundlagen von Datenbanken, Monheim, Germany, 2004.