Logo des Repositoriums
 

Explainable Data Matching: Selecting Representative Pairs with Active Learning Pair-Selection Strategies

dc.contributor.authorLaskowski, Lukas
dc.contributor.authorSold, Florian
dc.contributor.editorKönig-Ries, Birgitta
dc.contributor.editorScherzinger, Stefanie
dc.contributor.editorLehner, Wolfgang
dc.contributor.editorVossen, Gottfried
dc.date.accessioned2023-02-23T14:00:22Z
dc.date.available2023-02-23T14:00:22Z
dc.date.issued2023
dc.description.abstractIn both research and enterprise, dirty data poses numerous challenges. Many data cleaning pipelines include a data deduplication step that detects and removes entries within a given dataset which refer to the same real-world entity. Throughout the development of such deduplication techniques, data scientists have to make sense of the large result sets that their matching solutions generate to quickly identify changes in behavior or to discover opportunities for improvements. We propose an approach that aims to select a small subset of pairs from the result set of a data matching solution which is representative of the matching solution’s overall behavior. To evaluate our approach, we show that the performance of a matching solution trained on pairs selected according to our strategy outperforms a randomly selected subset of pairs.en
dc.identifier.doi10.18420/BTW2023-77
dc.identifier.isbn978-3-88579-725-8
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/40387
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofBTW 2023
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-331
dc.subjectEntity Resolution
dc.subjectData Matching
dc.subjectExplainableDM
dc.subjectPair Selection
dc.subjectBenchmark
dc.titleExplainable Data Matching: Selecting Representative Pairs with Active Learning Pair-Selection Strategiesen
dc.typeText/Conference Paper
gi.citation.endPage1104
gi.citation.publisherPlaceBonn
gi.citation.startPage1099
gi.conference.date06.-10. März 2023
gi.conference.locationDresden, Germany

Dateien

Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
C4-7.pdf
Größe:
795.43 KB
Format:
Adobe Portable Document Format