Extended Affinity Propagation Clustering for Multi-source Entity Resolution
dc.contributor.author | Lerm, Stefan | |
dc.contributor.author | Saeedi, Alieh | |
dc.contributor.author | Rahm, Erhard | |
dc.contributor.editor | Kai-Uwe Sattler | |
dc.contributor.editor | Melanie Herschel | |
dc.contributor.editor | Wolfgang Lehner | |
dc.date.accessioned | 2021-03-16T07:57:09Z | |
dc.date.available | 2021-03-16T07:57:09Z | |
dc.date.issued | 2021 | |
dc.description.abstract | Entity resolution is the data integration task of identifying matching entities (e.g. products, customers) in one or several data sources. Previous approaches for matching and clustering entities between multiple (>2) sources either treated the different sources as a single source or assumed that the individual sources are duplicate-free, so that only matches between sources have to be found. In this work we propose and evaluate a general Multi-Source Clean Dirty (MSCD) scheme with an arbitrary combination of clean (duplicate-free) and dirty sources. For this purpose, we extend a constraint-based clustering algorithm called Affinity Propagation (AP) for entity clustering with clean and dirty sources (MSCD-AP). We also consider a hierarchical version of it for improved scalability. Our evaluation considers a full range of datasets containing 0% to 100% of clean sources. We compare our proposed algorithms with other clustering schemes in terms of both match quality and runtime. | en |
dc.identifier.doi | 10.18420/btw2021-11 | |
dc.identifier.isbn | 978-3-88579-705-0 | |
dc.identifier.pissn | 1617-5468 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/35794 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik, Bonn | |
dc.relation.ispartof | BTW 2021 | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-311 | |
dc.subject | Entity Resolution | |
dc.subject | Clustering | |
dc.subject | Affinity Propagation | |
dc.subject | MSCD-AP | |
dc.title | Extended Affinity Propagation Clustering for Multi-source Entity Resolution | en |
gi.citation.endPage | 236 | |
gi.citation.startPage | 217 | |
gi.conference.date | 13.-17. September 2021 | |
gi.conference.location | Dresden | |
gi.conference.sessiontitle | Data Integration, Semantic Data Management, Streaming |
Dateien
Originalbündel
1 - 1 von 1