P290 - BTW2019 - Datenbanksysteme für Business, Technologie und Web - Workshopband
Autor*innen mit den meisten Dokumenten
Neueste Veröffentlichungen
- TextdokumentWorkshop Digitale Lehre im Fach Datenbanken(BTW 2019 – Workshopband, 2019) Rakow, Thomas C.; Faeskorn-Woyke, Heide
- TextdokumentWorkload-Driven Data Placement for GPU-Accelerated Database Management Systems(BTW 2019 – Workshopband, 2019) Schmidt, Christopher; Uflacker, MatthiasAn increase in the memory capacity of current Graphics Processing Unit (GPU) generations and advances in multi-GPU systems enables a large unified GPU memory space to be utilized by modern coprocessor-accelerated Database Management System (DBMS). We take this as an opportunity to revisit the idea of using GPU memory as a hot cache for the DBMS. In particular, we focus on the data placement for the hot cache. Based on previous approaches and their shortcomings, we present a new workload-driven data placement for a GPU-accelerated DBMS. Lastly, we outline how we aim to implement and evaluate our proposed approach by comparing it to existing data placement approaches in future work.
- TextdokumentSkew-resilient Query Processing for Fast Networks(BTW 2019 – Workshopband, 2019) Ziegler, Tobias; Binnig, Carsten; Röhm, UweMotivation: Scalable distributed in-memory databases are at the core of data-intensive computation. Although scaling-out solutions help to handle large amounts of data, more nodes do not necessarily lead to improved query performance. In fact, recent papers have shown that performance can even degrade when scaling out due to higher communication overhead (e.g., shuffling data across nodes) and limited bandwidth [Rö15]. Thus, current distributed database systems are built with the assumption that the network is the major bottleneck [BH13] and should be avoided at all costs. In recent years, high-speed networks (e.g., InfiniBand (IB)) with a bandwidth close to the local memory bus [Bi16] have become economically viable. These network technologies provide Remote Direct Memory Access (RDMA) to allow direct memory access to a remote host and also reduce the latency of data transfer through bypassing the remote’s CPU [In17, Gr10]. Therefore, the assumption that the network is the bottleneck no longer holds. Consequently, recent research has focused on integrating RDMA-enabled high-speed networks into existing database systems designed along a Shared-Nothing Architecture (SN) [Rö16, LYB17]. This architecture co-locates computation and data to reduce the communication overhead in a cluster. Although combining a SN with IB’s higher network bandwidth enables scalability to a certain extent, this approach fails if the data or workload is skewed and cannot be evenly partitioned. The root cause is that classical query execution schemes assume that each partition is processed by one node. Since nodes with larger partitions must process more data, they may become a bottleneck and hinder the overall scalability. In consequence, only utilizing the higher bandwidth without adapting the database architecture and query execution, does not automatically lead to improved scalability [Bi16]. Contributions: In this paper, we present a new approach to execute distributed queries on fast networks with RDMA. Our main contribution is a novel execution strategy, which enables collaborative query processing by remote work stealing to mitigate skew, as this is a common issues that hinders scalable query execution [WDJ91, Ly88]. Moreover, we implement this execution strategy in our prototype engine I-Store and show that it introduces almost no overhead to handle skew.
- TextdokumentAn Overview of Hawk: A Hardware-Tailored Code Generator for the Heterogeneous Many Core Age(BTW 2019 – Workshopband, 2019) Breß, Sebastian; Funke, Henning; Zeuch, Steffen; Rabl, Tilmann; Markl, VolkerProcessor manufacturers build increasingly specialized processors to mitigate the effects of the power wall in order to deliver improved performance. Currently, database engines have to be manually optimized for each processor which is a costly and error prone process. In this paper, we provide a summary of our recent VLDB Journal publication, where we propose concepts to adapt to performance enhancements of modern processors and to exploit their capabilities automatically. Our key idea is to create processor-specific code variants and to learn a well-performing code variant for each processor. These code variants leverage various parallelization strategies and apply both generic and processor-specific code transformations. We observe that performance of code variants may diverge up to two orders of magnitude. Thus, we need to generate custom code for each processor for peak performance. Hawk automatically finds efficient code variants for CPUs, GPUs, and MICs.
- TextdokumentQuery Planning for Transactional Stream Processing on Heterogeneous Hardware: Opportunities and Limitations(BTW 2019 – Workshopband, 2019) Götze, Philipp; Pohl, Constantin; Sattler, Kai-UweIn a heterogeneous hardware landscape consisting of various processing units and memory types, it is crucial to decide which device should be used when running a query. There is already a lot of research done for placement decisions on CPUs, coprocessors, GPUs, or FPGAs. However, those decisions can be further extended for the various types of memory within the same layer of the memory hierarchy. For storage, a division between SSDs, HDDs or even NVM is possible, whereas for main memory types like DDR4 and HBM exist. In this paper, we focus on query planning for the transactional stream processing model. We give an overview of several techniques and necessary parameters when optimizing a stateful query for various memory types, outlined with chosen experimental measurements to support our claims.
- TextdokumentReProVide: Towards Utilizing Heterogeneous Partially Reconfigurable Architectures for Near-Memory Data Processing(BTW 2019 – Workshopband, 2019) Becher, Andreas; Herrmann, Achim; Wildermann, Stefan; Teich, JürgenReconfigurable hardware such as Field-programmable Gate Arrays (FPGAs) is widely used for data processing in databases. Most of the related work focuses on accelerating one or a small set of specific operations like sort, join, regular expression matching. A drawback of such approaches is often the assumed static accelerator hardware architecture: Rather than adapting the hardware to fit the query, the query plan has to be adapted to fit the hardware. Moreover, operators or data types that are not supported by the accelerator have to be processed in software. As a remedy, approaches for exploiting the dynamic partial reconfigurability of FPGAs have been proposed that are able to adapt the datapath at runtime. However, on modern FPGAs, this introduces new challenges due to the heterogeneity of the available resources. In addition, not only the execution resources may be heterogeneous but also the memory resources. This work focuses on the architectural aspects of database (co-)processing on heterogeneous FPGA-based PSoC (programmable System-on-Chip) architectures including processors, specialized hardware components, multiple memory types and dynamically partially reconfigurable areas. We present an approach to support such (co-)processing called ReProVide. In particular, we introduce a model to formalize the challenging task of operator placement and buffer allocation onto such heterogeneous hardware and describe the difficulties of finding good placements. Furthermore, a detailed insight into different memory types and their peculiarities is given in order to use the strength of heterogeneous memory architectures. Here, we also highlight the implications of heterogeneous memories for the problem of query placement.
- TextdokumentBTW2019 - Datenbanksysteme für Business, Technologie und Web - Workshopband(BTW 2019 – Workshopband, 2019) Meyer, Holger; Ritter, Norbert; Thor, Andreas; Nicklas, Daniela; Heuer, Andreas; Klettke, Meike
- TextdokumentDeep Learning zur Vorhersage von Feinstaubbelastung(BTW 2019 – Workshopband, 2019) Alkhouri, Georges; Wilke, MoritzFeinstaubbelastung steht seit einiger Zeit in der öffentlichen Debatte und stellt mir hoher Wahrscheinlichkeit ein großes Gesundheitsrisiko dar. Laut WHO [Or06] kann die Redu-zierung von Feinstaub zur Senkung verschiedener Krankheiten, wie bspw. Herzinfarkten, Lungenkrebs und asmathischen Erkankungen dienen. Deswegen werden von der Organisa-tion Tagesgrenzwerte von 25 μg/m 3 für Partikel um 2,5 μm (PM2,5) und 50 μg/m 3 für Partikel um 10 μm (PM10) empfohlen. In diesem Beitrag zur Data Science Challenge soll gezeigt werden, wie die vorhandenen Feinstaubsensoren in der Stadt Leipzig genutzt werden können, um zukünftige Werte vorherzusagen.3 Eine solche Vorhersage könnte nicht nur zur Warnung dienen, sondern auch Grundlage für kurzfristige Gegenmaßnahmen (bspw. den Wechsel auf ÖPNV) bilden.
- TextdokumentPeaks and the Influence of Weather, Traffic, and Events on Particulate Pollution(BTW 2019 – Workshopband, 2019) Hagedorn, Stefan; Sattler, Kai-UweThe task of the Data Science Challenge as part of the BTW 2019 conference is to analyze air quality data collected by the luftdaten2 project. This project provides sensor measurements recorded from volunteers around the world. With do-it-yourself setups people can deploy their own sensors and report various environmental values to the project’s servers, where they are made available as open data for further analyses. Thus, data is available only in regions where volunteers decided to participate in the project. Since in our city, Ilmenau, as well as in the state Thuringia only very few sensors are present, we decided to shift our focus to a broader area around Thuringia.
- TextdokumentPrediction of air pollution with machine learning(BTW 2019 – Workshopband, 2019) Schmitz, Christian; Serai, Dhiren Devinder; Escobar Gava, TatianeCities worldwide are facing air quality issues, leading to bans of vehicles and lower quality of life for inhabitants. We forecast the air quality for Stuttgart based on expected weather condition. For that purpose, we extract, cleanse, and integrate the DHT22 and SDS11 sensors’ data to feed two different machine learning models for predicting the particulate matter values for the near future.