Lerm, StefanSaeedi, AliehRahm, ErhardKai-Uwe SattlerMelanie HerschelWolfgang Lehner2021-03-162021-03-162021978-3-88579-705-0https://dl.gi.de/handle/20.500.12116/35794Entity resolution is the data integration task of identifying matching entities (e.g. products, customers) in one or several data sources. Previous approaches for matching and clustering entities between multiple (>2) sources either treated the different sources as a single source or assumed that the individual sources are duplicate-free, so that only matches between sources have to be found. In this work we propose and evaluate a general Multi-Source Clean Dirty (MSCD) scheme with an arbitrary combination of clean (duplicate-free) and dirty sources. For this purpose, we extend a constraint-based clustering algorithm called Affinity Propagation (AP) for entity clustering with clean and dirty sources (MSCD-AP). We also consider a hierarchical version of it for improved scalability. Our evaluation considers a full range of datasets containing 0% to 100% of clean sources. We compare our proposed algorithms with other clustering schemes in terms of both match quality and runtime.enEntity ResolutionClusteringAffinity PropagationMSCD-APExtended Affinity Propagation Clustering for Multi-source Entity Resolution10.18420/btw2021-111617-5468