Logo des Repositoriums

Datenbank Spektrum 19(3) - November 2019

Autor*innen mit den meisten Dokumenten  

Auflistung nach:

Neueste Veröffentlichungen

1 - 8 von 8
  • Zeitschriftenartikel
    (Datenbank-Spektrum: Vol. 19, No. 3, 2019)
  • Zeitschriftenartikel
    Lock-free Data Structures for Data Stream Processing
    (Datenbank-Spektrum: Vol. 19, No. 3, 2019) Baumstark, Alexander; Pohl, Constantin
    Processing data in real-time instead of storing and reading from tables has led to a specialization of DBMS into the so-called data stream processing paradigm. While high throughput and low latency are key requirements to keep up with varying stream behavior and to allow fast reaction to incoming events, there are many possibilities how to achieve them. In combination with modern hardware, like server CPUs with tens of cores, the parallelization of stream queries for multithreading and vectorization is a common schema. High degrees of parallelism, however, need efficient synchronization mechanisms to allow good scaling with threads for shared memory access.In this work, we identify the most time-consuming operations for stream processing exemplarily for our own stream processing engine PipeFabric. In addition, we present different design principles of lock-free data structures which are suited to overcome those bottlenecks. We will finally demonstrate how lock-freedom greatly improves performance for join processing and tuple exchange between operators under different workloads. Nevertheless, the efficient usage of lock-free data structures comes with additional efforts and pitfalls, which we also discuss in this paper.
  • Zeitschriftenartikel
    Evaluating the Vector Supercomputer SX-Aurora TSUBASA as a Co-Processor for In-Memory Database Systems
    (Datenbank-Spektrum: Vol. 19, No. 3, 2019) Pietrzyk, Johannes; Habich, Dirk; Damme, Patrick; Focht, Erich; Lehner, Wolfgang
    In-memory column-store database systems are state of the art for the efficient processing of analytical workloads. In these systems, data compression as well as vectorization play an important role. Currently, the vectorized processing is done using regular SIMD (Single Instruction Multiple Data) extensions of modern processors. For example, Intel’s latest SIMD extension supports 512-bit vector registers which allows the parallel processing of 8× 64-bit values. From a database system perspective, this vectorization technique is not only very interesting for compression and decompression to reduce the computational overhead, but also for all database operators like joins, scan, as well as groupings. In contrast to these SIMD extensions, NEC Corporation has recently introduced a novel pure vector engine (supercomputer) as a co-processor called SX-Aurora TSUBASA. This vector engine features a vector length of 16.384 bits with the world’s highest bandwidth of up to 1.2 TB/s, which perfectly fits to data-intensive applications like in-memory database systems. Therefore, we describe the unique architecture and properties of this novel vector engine in this paper. Moreover, we present selected in-memory column-store-specific evaluation results to show the benefits of this vector engine compared to regular SIMD extensions. Finally, we conclude the paper with an outlook on our ongoing research activities in this direction.
  • Zeitschriftenartikel
    Analyzing Temporal Graphs with Gradoop
    (Datenbank-Spektrum: Vol. 19, No. 3, 2019) Rost, Christopher; Thor, Andreas; Rahm, Erhard
    The temporal analysis of evolving graphs is an important requirement in many domains but hardly supported in current graph database and graph processing systems. We therefore have started with extending the distributed graph analysis framework Gradoop for temporal graph analysis by adding time properties to vertices, edges and graphs and using them within graph operators. We outline these extensions and illustrate their use within analysis workflows. We further describe the implementation of the snapshot and diff operators and evaluated them.
  • Zeitschriftenartikel
    Chain-detection Between Clusters
    (Datenbank-Spektrum: Vol. 19, No. 3, 2019) Held, Janis; Beer, Anna; Seidl, Thomas
    Chains connecting two or more different clusters are a well known problem of clustering algorithms like DBSCAN or Single Linkage Clustering. Since already a small number of points resulting from, e. g., noise can form such a chain and build a bridge between different clusters, it can happen that the results of the clustering algorithm are distorted: several disparate clusters get merged into one. This single-link effect is rather known but to the best of our knowledge there are no satisfying solutions which extract those chains, yet. We present a new algorithm detecting not only straight chains between clusters, but also bent and noisy ones. Users are able to choose between eliminating one dimensional and higher dimensional chains connecting clusters to receive the underlying cluster structure. Also, the desired straightness can be set by the user. As this paper is an extension of [ 8 ], we apply our technique not only in combination with DBSCAN but also with single link hierarchical clustering. On a real world dataset containing traffic accidents in Great Britain we were able to detect chains emerging from streets between cities and villages, which led to clusters composed of diverse villages. Additionally, we analyzed the robustness regarding the variance of chains in synthetic experiments.
  • Zeitschriftenartikel
    (Datenbank-Spektrum: Vol. 19, No. 3, 2019) Härder, Theo
  • Zeitschriftenartikel
    (Datenbank-Spektrum: Vol. 19, No. 3, 2019)
  • Zeitschriftenartikel
    Particulate Matter Matters—The Data Science Challenge @ BTW 2019
    (Datenbank-Spektrum: Vol. 19, No. 3, 2019) Meyer, Holger J.; Grunert, Hannes; Waizenegger, Tim; Woltmann, Lucas; Hartmann, Claudio; Lehner, Wolfgang; Esmailoghli, Mahdi; Redyuk, Sergey; Martinez, Ricardo; Abedjan, Ziawasch; Ziehn, Ariane; Rabl, Tilmann; Markl, Volker; Schmitz, Christian; Serai, Dhiren Devinder; Gava, Tatiane Escobar
    For the second time, the Data Science Challenge took place as part of the 18th symposium “Database Systems for Business, Technology and Web” (BTW) of the Gesellschaft für Informatik (GI). The Challenge was organized by the University of Rostock and sponsored by IBM and SAP. This year, the integration, analysis and visualization around the topic of particulate matter pollution was the focus of the challenge. After a preselection round, the accepted participants had one month to adapt their developed approach to a substantiated problem, the real challenge. The final presentation took place at BTW 2019 in front of the prize jury and the attending audience. In this article, we give a brief overview of the schedule and the organization of the Data Science Challenge. In addition, the problem to be solved and its solution will be presented by the participants.