Now showing items 1-3 of 3
Extended Affinity Propagation Clustering for Multi-source Entity Resolution
Entity resolution is the data integration task of identifying matching entities (e.g. products, customers) in one or several data sources. Previous approaches for matching and clustering entities between multiple (>2) sources either treated the different sources as a single source or assumed that the individual sources ...
Graph Sampling with Distributed In-Memory Dataflow Systems
Given a large graph, graph sampling determines a subgraph with similar characteristics for certain metrics of the original graph. The samples are much smaller thereby accelerating and simplifying the analysis and visualization of large graphs. We focus on the implementation of distributed graph sampling for Big Data ...
Multi-Party Privacy Preserving Record Linkage in Dynamic Metric Space
We propose and evaluate several approaches for multi-party privacy-preserving record linkage (MP-PPRL) for multiple data sources. To reduce the number of comparisons for scalability we propose a new pivot-based metric space approach that dynamically adapts the selection of pivots for additional sources and growing data ...