Auflistung nach Autor:in "Habich, Dirk"
1 - 10 von 33
Treffer pro Seite
Sortieroptionen
- Textdokument1st Workshop on Novel Data Management Ideas on Heterogeneous (Co-)Processors (NoDMC)(BTW 2019 – Workshopband, 2019) Broneske, David; Habich, Dirk
- TextdokumentAggregate-based Training Phase for ML-based Cardinality Estimation(BTW 2021, 2021) Woltmann, Lucas; Hartmann, Claudio; Habich, Dirk; Lehner, WolfgangCardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 63 with our aggregate-based training phase and thus outperform indexes.
- KonferenzbeitragAnnotationsbasierte Prozessmodellierung in SOA – dargestellt an einem Beispiel aus dem Precision Dairy Farming(Precision Agriculture Reloaded – Informationsgestützte Landwirtschaft, 2010) Gietl, Franziska; Spilke, Joachim; Habich, Dirk; Lehner, WolfgangBei der Entwicklung einer serviceorientierten Architektur im Bereich des Precision Dairy Farmings haben wir uns mit der Modellierung unternehmens-übergreifender Prozesse mit Hilfe der Business Process Modeling Notation (BPMN) beschäftigt. Da diese Modellierung stellenweise sehr abstrakt ist, schlagen wir einen angepassten Modellierungsansatz unter der Verwendung von Annotationen vor. Damit können notwendige Bedingungen direkt dem betreffenden Objekt zugeordnet werden, wodurch die Modellierung fachbezogener und damit für den Nutzer transparenter wird.
- KonferenzbeitragData-Warehousing 3.0 – Die Rolle von Data-Warehouse-Systemen auf Basis von In-Memory-Technologie(IMDM 2011 – Proceedings zur Tagung Innovative Unternehmensanwendungen mit In-Memory Data Management, 2011) Thiele, Maik; Lehner, Wolfgang; Habich, DirkIn diesem Beitrag widmen wir uns der Frage, welche Rolle aktuelle Trends der Hardund Software für Datenbanksysteme spielen, um als Enabler für neuartige Konzepte im Umfeld des Data-Warehousing zu dienen. Als zentraler Schritt der Evolution im Kontext des Data-Warehousing wird dabei die enge Kopplung zu operativen Systemen gesehen, um eine direkte Rückkopplung bzw. Einbettung in operationale Geschäftsprozesse zu realisieren. In diesem Papier diskutieren wir die Fragen, wie In-Memory-Technologie das Konzept von Echtzeit-DWH-Systemen unterstützt bzw. ermöglicht. Dazu stellen wir zum einen eine Referenzarchitektur für DWH-Systeme vor, die insbesondere pushund pullbasierte Datenversorgung berücksichtigt. Zum anderen diskutieren wir die konkrete Rolle von In-Memory-Systemen mit Blick auf konkrete Aspekte wie der Frage optionaler Persistenzschichten, Reduktion der Batchgröße, Positionierung von In-Memory-Techniken für den Aufbau eines Corporate Memorys und die schnelle Bereitstellung externer Datenbestände zur Unterstützung situativer BI- Szenarien.
- JournalDiversity of Processing Units(Datenbank-Spektrum: Vol. 18, No. 1, 2018) Lehner, Wolfgang; Ungethüm, Annett; Habich, Dirk
- KonferenzbeitragEfficient in-memory indexing with generalized prefix trees(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Boehm, Matthias; Schlegel, Benjamin; Volk, Peter Benjamin; Fischer, Ulrike; Habich, Dirk; Lehner, WolfgangEfficient data structures for in-memory indexing gain in importance due to (1) the exponentially increasing amount of data, (2) the growing main-memory capacity, and (3) the gap between main-memory and CPU speed. In consequence, there are high performance demands for in-memory data structures. Such index structures are used-with minor changes-as primary or secondary indices in almost every DBMS. Typically, tree-based or hash-based structures are used, while structures based on prefix-trees (tries) are neglected in this context. For tree-based and hash-based structures, the major disadvantages are inherently caused by the need for reorganization and key comparisons. In contrast, the major disadvantage of trie-based structures in terms of high memory consumption (created and accessed nodes) could be improved. In this paper, we argue for reconsidering prefix trees as in-memory index structures and we present the generalized trie, which is a prefix tree with variable prefix length for indexing arbitrary data types of fixed or variable length. The variable prefix length enables the adjustment of the trie height and its memory consumption. Further, we introduce concepts for reducing the number of created and accessed trie levels. This trie is order-preserving and has deterministic trie paths for keys, and hence, it does not require any dynamic reorganization or key comparisons. Finally, the generalized trie yields improvements compared to existing in-memory index structures, especially for skewed data. In conclusion, the generalized trie is applicable as general-purpose in-memory index structure in many different OLTP or hybrid (OLTP and OLAP) data management systems that require balanced read/write performance.
- KonferenzbeitragEnergy Elasticity on Heterogeneous Hardware using Adaptive Resource Reconfiguration(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Ungethüm, Annett; Kissinger, Thomas; Mentzel, Willi-Wolfram; Mier, Eric; Habich, Dirk; Lehner, WolfgangEnergy awareness of database systems has emerged as a critical research topic, because energy consumption is becoming a major factor. Recent energy-related hardware developments tend towards o ering more and more configuration opportunities for the software to control its own energy-based behavior. Existing research within the DB community so far mainly focused on leveraging this configuration spectrum to identify the most energy-efficient configuration for specific operators or entire queries. In [Un16], we introduced the concept of energy elasticity and proposed the energy-control loop as an implementation of this concept. Energy elasticity refers to the ability of software to behave energy-proportional and energy-e cient at the same time while maintaining a certain quality of service.
- ZeitschriftenartikelEvaluating the Vector Supercomputer SX-Aurora TSUBASA as a Co-Processor for In-Memory Database Systems(Datenbank-Spektrum: Vol. 19, No. 3, 2019) Pietrzyk, Johannes; Habich, Dirk; Damme, Patrick; Focht, Erich; Lehner, WolfgangIn-memory column-store database systems are state of the art for the efficient processing of analytical workloads. In these systems, data compression as well as vectorization play an important role. Currently, the vectorized processing is done using regular SIMD (Single Instruction Multiple Data) extensions of modern processors. For example, Intel’s latest SIMD extension supports 512-bit vector registers which allows the parallel processing of 8× 64-bit values. From a database system perspective, this vectorization technique is not only very interesting for compression and decompression to reduce the computational overhead, but also for all database operators like joins, scan, as well as groupings. In contrast to these SIMD extensions, NEC Corporation has recently introduced a novel pure vector engine (supercomputer) as a co-processor called SX-Aurora TSUBASA. This vector engine features a vector length of 16.384 bits with the world’s highest bandwidth of up to 1.2 TB/s, which perfectly fits to data-intensive applications like in-memory database systems. Therefore, we describe the unique architecture and properties of this novel vector engine in this paper. Moreover, we present selected in-memory column-store-specific evaluation results to show the benefits of this vector engine compared to regular SIMD extensions. Finally, we conclude the paper with an outlook on our ongoing research activities in this direction.
- KonferenzbeitragFeingranulare Verarbeitung von XML-Strömen(Informatik 2004 – Informatik verbindet – Band 1, Beiträge der 34. Jahrestagung der Gesellschaft für Informatik e.V. (GI), 2004) Schmidt, Sven; Habich, Dirk; Lehner, WolfgangAusgehend von einer Vielzahl von Quellen haben Daten oft einen transienten Charakter und werden in Form von Datenströmen disseminiert. Zur adäquaten Verarbeitung existieren sogenannte Datenstrom-Managementsysteme (DSMS), die in der Lage sind, in strombasierter Art und Weise vom Benutzer spezifizierte Anfragen bzgl. der Datenströme auszuwerten und die Ergebnisse kontinuierlich auszugeben. In diesem Beitrag wird gezeigt, dass anspruchsvolle Verarbeitungsoperationen auf hierarchisch strukturierten Datenströmen realisiert werden können. Dabei wird die Verwendung von XML im Kontext der Datenstrom-Managementsysteme motiviert und ein angepasstes Verarbeitungsmodell basierend auf XML-Creeks skizziert.
- TextdokumentFighting the Duplicates in Hashing: Conflict Detection-aware Vectorization of Linear Probing(BTW 2019, 2019) Pietrzyk, Johannes; Ungethüm, Annett; Habich, Dirk; Lehner, WolfgangHash tables are a core data structure in database systems, because they are fundamental for many database operators like hash-based join and aggregation. In recent years, the efficient vectorized implementation using SIMD (Single Instruction Multiple Data) instructions has attracted a lot of attention. Generally, all hash table implementations need to address what happens when collisions occur. In order to do that, the collisions have to be detected first. There are two types of collisions: (i) key duplicates and (ii) hash value duplicates. The second type is more complicated than the first type. In this paper, we investigate linear probing as a heavily applied hash table implementation and we present an extension of the state-of-the-art vectorized implementation with a hardware-supported duplicate or collision detection. For that, we use novel SIMD instructions which have been introduced with Intel’s SIMD instruction set extension AVX-512. As we are going to show, our approach outperforms the state-of-the-art vectorized version for the key handling, but introduces novel challenges for the value handling. We conclude the paper with some ideas how to tackle that challenge.