P311 - BTW2021- Datenbanksysteme für Business, Technologie und Web
Auflistung P311 - BTW2021- Datenbanksysteme für Business, Technologie und Web nach Schlagwort "Affinity Propagation"
1 - 1 von 1
Treffer pro Seite
Sortieroptionen
- TextdokumentExtended Affinity Propagation Clustering for Multi-source Entity Resolution(BTW 2021, 2021) Lerm, Stefan; Saeedi, Alieh; Rahm, ErhardEntity resolution is the data integration task of identifying matching entities (e.g. products, customers) in one or several data sources. Previous approaches for matching and clustering entities between multiple (>2) sources either treated the different sources as a single source or assumed that the individual sources are duplicate-free, so that only matches between sources have to be found. In this work we propose and evaluate a general Multi-Source Clean Dirty (MSCD) scheme with an arbitrary combination of clean (duplicate-free) and dirty sources. For this purpose, we extend a constraint-based clustering algorithm called Affinity Propagation (AP) for entity clustering with clean and dirty sources (MSCD-AP). We also consider a hierarchical version of it for improved scalability. Our evaluation considers a full range of datasets containing 0% to 100% of clean sources. We compare our proposed algorithms with other clustering schemes in terms of both match quality and runtime.