Logo des Repositoriums
 

Communication-Optimal Parallel Reservoir Sampling

dc.contributor.authorWinter, Christian
dc.contributor.authorSichert, Moritz
dc.contributor.authorBirler, Altan
dc.contributor.authorNeumann, Thomas
dc.contributor.authorKemper, Alfons
dc.contributor.editorKönig-Ries, Birgitta
dc.contributor.editorScherzinger, Stefanie
dc.contributor.editorLehner, Wolfgang
dc.contributor.editorVossen, Gottfried
dc.date.accessioned2023-02-23T13:59:52Z
dc.date.available2023-02-23T13:59:52Z
dc.date.issued2023
dc.description.abstractWhen evaluating complex analytical queries on high-velocity data streams, many systems cannot run those queries on all elements of a stream. Sampling is a widely used method to reduce the system load by replacing the input with a representative yet manageable subset. For unbounded data, reservoir sampling generates a fixed-size uniform sample independent of the input cardinality. However, the collection of reservoir samples itself can already be a bottleneck for high-velocity data.In this paper, we introduce a technique that allows fully parallelizing reservoir sampling for many-core architectures. Our approach relies on the efficient combination of thread-local samples taken over chunks of the input without necessitating communication during the sampling phase and with minimal communication when merging. We show how our efficient merge guarantees uniform random samples while allowing data to be distributed over worker threads arbitrarily. Our analysis of this approach within the Umbra database system demonstrates linear scaling along the available threads and the ability to sustain high-velocity workloads.en
dc.identifier.doi10.18420/BTW2023-27
dc.identifier.isbn978-3-88579-725-8
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/40332
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofBTW 2023
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-331
dc.subjectReservoir Sampling
dc.subjectParallel Sampling
dc.subjectStream Processing
dc.titleCommunication-Optimal Parallel Reservoir Samplingen
dc.typeText/Conference Paper
gi.citation.endPage578
gi.citation.publisherPlaceBonn
gi.citation.startPage567
gi.conference.date06.-10. März 2023
gi.conference.locationDresden, Germany

Dateien

Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
B5-4.pdf
Größe:
455.82 KB
Format:
Adobe Portable Document Format