UniKL Logo

Lehrgebiet Informationssysteme

FB Informatik

FB Informatik
 
LG IS
AG DBIS
AG HIS
Jobs / Tasks
Courses
Publications
Contact
Misc
Impressum
(C) AG DBIS
 

Entity Identification in XML Documents


Leonardo Ribeiro

Kaiserslautern University of Technology
Dept. of Computer Science (AG DBIS)
P.O. Box 3049, 67653 Kaiserslautern, Germany
e-mail: aguiar@informatik.uni-kl.de

Theo Härder

Kaiserslautern University of Technology
Dept. of Computer Science (AG DBIS)
P.O. Box 3049, 67653 Kaiserslautern, Germany
e-mail: haerder@informatik.uni-kl.de

Full paper (PDF version)


Abstract

As a natural result of the dissemination of a large variety of XML databases, the well-known problem of data integration must be faced from the XML viewpoint. One of the basic functions of an integration system is the record linkage, the task of comparing records to determine those that are differently represented, but relate to the same entity. As a consequence of the intrinsically high computation cost, the majority of the approaches to record linkage are based on off-line procedures. Such approaches, however, just meet the requirements of data integration architectures that materialize the data such as data warehouses. Recent approaches based on approximate joins are aimed at enabling duplicate identification in on-line procedures with reasonable results. In this paper, we proceed along this research direction and outline our current ideas how to account for the specific characteristics of XML documents.