Extended Affinity Propagation Clustering for Multi-source Entity Resolution
Abstract
Entity resolution is the data integration task of identifying matching entities (e.g. products, customers) in one or several data sources. Previous approaches for matching and clustering entities between multiple (>2) sources either treated the different sources as a single source or assumed that the individual sources are duplicate-free, so that only matches between sources have to be found. In this work we propose and evaluate a general Multi-Source Clean Dirty (MSCD) scheme with an arbitrary combination of clean (duplicate-free) and dirty sources. For this purpose, we extend a constraint-based clustering algorithm called Affinity Propagation (AP) for entity clustering with clean and dirty sources (MSCD-AP). We also consider a hierarchical version of it for improved scalability. Our evaluation considers a full range of datasets containing 0% to 100% of clean sources. We compare our proposed algorithms with other clustering schemes in terms of both match quality and runtime.
- Citation
- BibTeX
Lerm, S., Saeedi, A. & Rahm, E.,
(2021).
Extended Affinity Propagation Clustering for Multi-source Entity Resolution.
In:
, ., , . & , .
(Hrsg.),
BTW 2021.
Gesellschaft für Informatik, Bonn.
(S. 217-236).
DOI: 10.18420/btw2021-11
@inproceedings{mci/Lerm2021,
author = {Lerm, Stefan AND Saeedi, Alieh AND Rahm, Erhard},
title = {Extended Affinity Propagation Clustering for Multi-source Entity Resolution},
booktitle = {BTW 2021},
year = {2021},
editor = {Kai-Uwe Sattler AND Melanie Herschel AND Wolfgang Lehner} ,
pages = { 217-236 } ,
doi = { 10.18420/btw2021-11 },
publisher = {Gesellschaft für Informatik, Bonn},
address = {}
}
author = {Lerm, Stefan AND Saeedi, Alieh AND Rahm, Erhard},
title = {Extended Affinity Propagation Clustering for Multi-source Entity Resolution},
booktitle = {BTW 2021},
year = {2021},
editor = {Kai-Uwe Sattler AND Melanie Herschel AND Wolfgang Lehner} ,
pages = { 217-236 } ,
doi = { 10.18420/btw2021-11 },
publisher = {Gesellschaft für Informatik, Bonn},
address = {}
}
Sollte hier kein Volltext (PDF) verlinkt sein, dann kann es sein, dass dieser aus verschiedenen Gruenden (z.B. Lizenzen oder Copyright) nur in einer anderen Digital Library verfuegbar ist. Versuchen Sie in diesem Fall einen Zugriff ueber die verlinkte DOI: 10.18420/btw2021-11
Haben Sie fehlerhafte Angaben entdeckt? Sagen Sie uns Bescheid: Send Feedback
More Info
DOI: 10.18420/btw2021-11
ISBN: 978-3-88579-705-0
ISSN: 1617-5468
xmlui.MetaDataDisplay.field.date: 2021
Language:
(en)
