Lock-free Data Structures for Data Stream Processing

Baumstark, AlexanderPohl, Constantin2021-05-042021-05-0420192019http://dx.doi.org/10.1007/s13222-019-00329-4https://dl.gi.de/handle/20.500.12116/36378Processing data in real-time instead of storing and reading from tables has led to a specialization of DBMS into the so-called data stream processing paradigm. While high throughput and low latency are key requirements to keep up with varying stream behavior and to allow fast reaction to incoming events, there are many possibilities how to achieve them. In combination with modern hardware, like server CPUs with tens of cores, the parallelization of stream queries for multithreading and vectorization is a common schema. High degrees of parallelism, however, need efficient synchronization mechanisms to allow good scaling with threads for shared memory access.In this work, we identify the most time-consuming operations for stream processing exemplarily for our own stream processing engine PipeFabric. In addition, we present different design principles of lock-free data structures which are suited to overcome those bottlenecks. We will finally demonstrate how lock-freedom greatly improves performance for join processing and tuple exchange between operators under different workloads. Nevertheless, the efficient usage of lock-free data structures comes with additional efforts and pitfalls, which we also discuss in this paper.Concurrent Data StructuresLock-freeParallelismStream ProcessingLock-free Data Structures for Data Stream ProcessingText/Journal Article10.1007/s13222-019-00329-41610-1995