Auflistung P331 - BTW2023- Datenbanksysteme für Business, Technologie und Web nach Erscheinungsdatum
1 - 10 von 80
Treffer pro Seite
- KonferenzbeitragPriority queues for database query processing(BTW 2023, 2023) Graefe, GoetzInteresting orderings let sort-based query processing out-perform hash-based algorithms, but only tree-of-losers priority queues and offset-value coding permit competing in all cases including large unsorted inputs with large or complex keys. As long as this competition persists, alternative algorithms with equivalent functionality will plague query execution, e.g., in software maintenance and in query plan scheduling; and mistaken algorithm choices will plague query optimization, e.g., for joins, intersection, and grouping.After explaining tree-of-losers priority queues and offset-value coding, our work introduces necessary extensions for efficient run generation (in external merge sort) with variable-size records. The required changes in tree-of-losers priority queues support increasing and decreasing any key value at any time in logarithmic time, including incremental maintenance of offset-value codes, and with the expected time for key value increases independent of the size of the priority queue. As priority queues are widely used in all kinds of scheduling applications, our contributions go beyond database query processing. Double-ended priority queues are discussed in detail as they nicely illustrate the concepts.To the best of our knowledge, this is the first time that tree-of-losers priority queues have been extended to addressable priority queues and to non-monotonic sequences of input keys; and that offset-value coding has been extended to non-monotonic sequences of input keys. The proposed solutions and the included code snippets are simple, small, and fast, in contrast to the time and effort spent on bringing them to this state.
- KonferenzbeitragWhich Rules Entail this Fact? - An Efficient Approach Using RDBMSs(BTW 2023, 2023) Gutberlet, Tim; Sauerbier, JanikIn this paper, we focus on the problem of identifying all rules that entail a certain target fact given a knowledge graph and a set of previously learned rules. This problem is relevant in the context of link prediction and explainability. We propose an efficient approach using relational database technology including indexing, filtering and pre-computing methods. Our experiments demonstrate the efficiency of our approach and the effect of various optimizations on different datasets like YAGO3-10, WN18RR and FB15k-237 using rules learned by the bottom up rule learner AnyBURL.
- KonferenzbeitragLearn What Really Matters: A Learning-to-Rank Approach for ML-based Query Optimization(BTW 2023, 2023) Behr, Henriette; Markl, Volker; Kaoudi, ZoiQuery optimization is crucial for any data management system to achieve good performance. Recent advancements in Machine Learning (ML) have led to several efforts in the database research community that aim at improving query optimization with the help of ML. In particular, many works propose replacing the cost model used during plan enumeration with an ML model. The goal of these works is to learn a regression model from previously executed query plans that estimates the runtime of a given plan. Interestingly, it is well-known that what really matters in query optimization is the order of the query plans and not their actual cost or runtime. We thus take a learning-to-rank approach and propose a novel neural network model architecture that considers a plan in comparison with other equivalent plans that belong to the same query. We use our model architecture together with a loss function that incorporates ranking metrics into the learning process to highlight the learning-to-rank objective.To enable training, we first extract features from query plans by adapting a state-of-the-art deep learning approach so that all features are independent of the input dataset schema. Second, we devise two score functions that map the runtime of plans to scores which are then used as labels. We integrate the trained model into an adapted bottom-up plan enumeration algorithm that finds the best possible execution plan for a given query. We evaluate our approach against two state-of-the-art ML models and the highly tuned cost model of a commercial database and measure the runtime of the plans chosen in each case when executed in the database. We show that our approach achieves up to an order of magnitude better query performance than the comparison models and is able to either match (for short and medium-running queries) or outperform the commercial database (up to 5x for long-running queries).
- KonferenzbeitragIBM Data Gate: Making On-Premises Mainframe Databases Available to Cloud Applications(BTW 2023, 2023) Stolze, Knut; Beier, Felix; Dimov, Vassil; Kalogeiton, Eirini; Toši?, MateoMany companies use databases on the mainframe for their mission critical applications. They will continue to do so in the future. It is important to exploit this existing data for analysis and business decisions via modern applications that are often built for cloud environments. IBM Db2 for z/OS Data Gate (Data Gate) is bridging the gap between mainframe databases and such cloud-native applications. It offers high-performance data synchronization connecting both worlds, while providing data coherence at the level of individual transactions.Data Gate is a hybrid cloud solution, which protects existing systems and applications (and investments into those) while enabling new use cases to work with and analyze mainframe data. It evolved from the IBM Db2 Analytics Accelerator (IDAA) technology by adjusting the architecture and some of the functionality. In this paper, we give an overview of Data Gate and how it addresses typical ETL issues like code page conversions, data coherence, encryption or integration with other cloud services. We also describe how Data Gate can be used to handle query acceleration or archiving of cold data -just like IDAA did. Along the lines, we highlight key differences between the two products.
- KonferenzbeitragAccelerating Large Table Scan using Processing-In-Memory Technology(BTW 2023, 2023) Baumstark, Alexander; Jibril, Muhammad Attahir; Sattler, Kai-UweToday’s systems are capable of storing large amounts of data in main memory. In-memoryDBMSs can benefit particularly from this development. However, the processing of the data fromthe main memory necessarily has to run via the CPU. This creates a bottleneck, which affects thepossible performance of the DBMS. The Processing-In-Memory (PIM) technology is a paradigm toovercome this problem, which was not available in commercial systems for a long time. However, withthe availability of UPMEM, a commercial system is finally available that provides PIM technologyin hardware. In this work, the main focus was on the optimization of the table scan, a fundamental,and memory-bound operation. Here a possible approach is shown, which can be used to optimizethis operation by using PIM. This method was then tested for parallelism and execution time inbenchmarks with different table sizes and compared to the usual table scan. The result is a table scanthat outperforms the scan on the usual CPU significantly.
- KonferenzbeitragWorking with Disaggregated Systems. What are the Challenges and Opportunities of RDMA and CXL?(BTW 2023, 2023) Geyer, Andreas; Ritter, Daniel; Lee, Dong Hun; Ahn, Minseon; Pietrzyk, Johannes; Krause, Alexander; Habich, Dirk; Lehner, WolfgangThe usage of disaggregated systems in large scale data-centers offers a lot of flexibility and easy scalability in comparison to the traditional statically configured scale-up and scaleout systems. Disaggregated architectures allow for the creation of software composable systems in order to create a virtual machine by software out of the pool of available hardware resources. In this paper, we propose a memory disaggregation classification and applicable use cases. We would be delighted to present our ideas and the memory disaggregation classification at the workshop and discuss the presented ideas. The valuable feedback of the attendees will help us to further refine our classification both in terms of preciseness and applicability.
- KonferenzbeitragAdaptive Architectures for Robust Data Management Systems(BTW 2023, 2023) Bang, TiemoForm follows function is a well-known expression by the architect Sullivan asserting that the architecture of a building should follow its function. 'Adaptive Architectures for Robust Data Management Systems' is a dissertation asserting that DBMS architectures should follow changing workload and hardware to robustly achieve high DBMS performance. The dissertation first evaluates how workload and hardware affect the performance of DBMSs with static architectures. This evaluation concludes that static DBMS architectures degrade DBMS performance under changing workload and hardware, and hence the DBMS architecture has to become adaptive. Subsequently, adaptation concepts for the architecture of single-server and multi-server DBMSs are proposed. These concepts focus fine-grained adaptation of DBMS architectures and are realized through asynchronous programming models. These programming models decouple the implementation of DBMS components from fine-grained architectural optimization. Thereby, optimizers can derive novel architectures better fitting individual DBMS components, leading to high and robust DBMS performance under changing conditions.
- KonferenzbeitragWorkload Prediction for IoT Data Management Systems(BTW 2023, 2023) Burrell, David; Chatziliadis, Xenofon; Zacharatou, Eleni Tzirita; Zeuch, Steffen; Markl, VolkerThe Internet of Things (IoT) is an emerging technology that allows numerous devices, potentially spread over a large geographical area, to collect and collectively process data from high-speed data streams.To that end, specialized IoT data management systems (IoTDMSs) have emerged.One challenge in those systems is the collection of different metrics from devices in a central location for analysis. This analysis allows IoTDMSs to maintain an overview of the workload on different devices and to optimize their processing. However, as an IoT network comprises of many heterogeneous devices with low computation resources and limited bandwidth, collecting and sending workload metrics can cause increased latency in data processing tasks across the network.In this ongoing work, we present an approach to avoid unnecessary transmission of workload metrics by predicting CPU, memory, and network usage using machine learning (ML).Specifically, we demonstrate the performance of two ML models, linear regression and Long Short-Term Memory (LSTM) neural network, and show the features that we explored to train these models.This work is part of an ongoing research to develop a monitoring tool for our new IoTDMS named NebulaStream.
- KonferenzbeitragReliable Rules for Relation Extraction in a Multimodal Setting(BTW 2023, 2023) Engelmann, Björn; Schaer, PhilippWe present an approach to extract relations from multimodal documents using a few training data. Furthermore, we derive explanations in the form of extraction rules from the underlying model to ensure the reliability of the extraction. Finally, we will evaluate how reliable (high model fidelity) extracted rules are and which type of classifier is suitable in terms of F1 Score and explainability. Our code and data are available at https://osf.io/dn9hm/?view_only=7e65fd1d4aae44e1802bb5ddd3465e08.
- KonferenzbeitragSportsTables: A new Corpus for Semantic Type Detection(BTW 2023, 2023) Langenecker, Sven; Sturm, Christoph; Schalles, Christian; Binnig, CarstenTable corpora such as VizNet or TURL which contain annotated semantic types per column are important to build machine learning models for the task of automatic semantic type detection. However, there is a huge discrepancy between corpora that are used for training and testing since real-world data lakes contain a huge fraction of numerical data which are not present in existing corpora. Hence, in this paper, we introduce a new corpus that contains a much higher proportion of numerical columns than existing corpora. To reflect the distribution in real-world data lakes, our corpus SportsTables has on average approx. 86% numerical columns, posing new challenges to existing semantic type detection models which have mainly targeted non-numerical columns so far. To demonstrate this effect, we show the results of a first study using a state-of-the-art approach for semantic type detection on our new corpus and demonstrate significant performance differences in predicting semantic types for textual and numerical data.