Auflistung nach Autor:in "Hartmann, Claudio"
1 - 9 von 9
Treffer pro Seite
Sortieroptionen
- TextdokumentAggregate-based Training Phase for ML-based Cardinality Estimation(BTW 2021, 2021) Woltmann, Lucas; Hartmann, Claudio; Habich, Dirk; Lehner, WolfgangCardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 63 with our aggregate-based training phase and thus outperform indexes.
- TextdokumentAssessing the Impact of Driving Bans with Data Analysis(BTW 2019 – Workshopband, 2019) Woltmann, Lucas; Hartmann, Claudio; Lehner, Wolfgang
- ZeitschriftenartikelFeature-aware forecasting of large-scale time series data sets(it - Information Technology: Vol. 62, No. 3-4, 2020) Hartmann, Claudio; Kegel, Lars; Lehner, WolfgangThe Internet of Things (IoT) sparks a revolution in time series forecasting. Traditional techniques forecast time series individually, which becomes unfeasible when the focus changes to thousands of time series exhibiting anomalies like noise and missing values. This work presents CSAR, a technique forecasting a set of time series with only one model, and a feature-aware partitioning applying CSAR on subsets of similar time series. These techniques provide accurate forecasts a hundred times faster than traditional techniques, preparing forecasting for the arising challenges of the IoT era.
- KonferenzbeitragJumpXClass: Explainable AI for Jump Classification in Trampoline Sports(BTW 2023, 2023) Woltmann, Lucas; Ferger, Katja; Hartmann, Claudio; Lehner, WolfgangMovement patterns in trampoline gymnastics have become faster and more complex with the increase in the athletes’ capabilities. This makes the assessment of jump type, pose, and quality during training or competitions by humans very difficult or even impossible. To counteract this development, data-driven solutions are thought to be a solution to improve training. In recent work, sensor measurements and machine learning is used to automatically predict jumps and give feedback to the athletes and trainers. However, machine learning models, and especially neural networks, are black boxes most of the time. Therefore, the athletes and trainers cannot gain any insights about the jump from the machine learning-based jump classification. To better understand the jump execution during training, we propose JumpXClass: a tool for automatic machine learning-based jump classification with explainable artificial intelligence. Using elements of explainable artificial intelligence can improve the training experience for athletes and trainers. This work will demonstrate a live system capable to classify and explain jumps from trampoline athletes.
- JournalLarge-Scale Time Series Analytics(Datenbank-Spektrum: Vol. 19, No. 1, 2019) Hahmann, Martin; Hartmann, Claudio; Kegel, Lars; Lehner, Wolfgang
- KonferenzbeitragOptimizing Query Processing in PostgreSQL Through Learned Optimizer Hints(BTW 2023, 2023) Thiessat, Jerome; Woltmann, Lucas; Hartmann, Claudio; Habich, DirkQuery optimization in database systems is an important aspect and despite decades of research, it isstill far from being solved. Nowadays, query optimizers usually provide hints to be able to steer theoptimization on a query-by-query basis. However, setting the best-fitting hints is challenging. To tacklethat, we present a learning-based approach to predict the best-fitting hints for each incoming query. Inparticular, our learning approach is based on simple gradient boosting, where we learn one modelper query context for fine-grained predictions rather than a single global context-agnostic model asproposed in related work. We demonstrate the efficiency as well as effectiveness of our learning-basedapproach using the open-source database system PostgreSQL and show that our approach outperformsrelated work in that context.
- ZeitschriftenartikelParticulate Matter Matters—The Data Science Challenge @ BTW 2019(Datenbank-Spektrum: Vol. 19, No. 3, 2019) Meyer, Holger J.; Grunert, Hannes; Waizenegger, Tim; Woltmann, Lucas; Hartmann, Claudio; Lehner, Wolfgang; Esmailoghli, Mahdi; Redyuk, Sergey; Martinez, Ricardo; Abedjan, Ziawasch; Ziehn, Ariane; Rabl, Tilmann; Markl, Volker; Schmitz, Christian; Serai, Dhiren Devinder; Gava, Tatiane EscobarFor the second time, the Data Science Challenge took place as part of the 18th symposium “Database Systems for Business, Technology and Web” (BTW) of the Gesellschaft für Informatik (GI). The Challenge was organized by the University of Rostock and sponsored by IBM and SAP. This year, the integration, analysis and visualization around the topic of particulate matter pollution was the focus of the challenge. After a preselection round, the accepted participants had one month to adapt their developed approach to a substantiated problem, the real challenge. The final presentation took place at BTW 2019 in front of the prize jury and the attending audience. In this article, we give a brief overview of the schedule and the organization of the Data Science Challenge. In addition, the problem to be solved and its solution will be presented by the participants.
- KonferenzbeitragPostBOUND: PostgreSQL with Upper Bound SPJ Query Optimization(BTW 2023, 2023) Bergmann, Rico; Hertzschuch, Axel; Hartmann, Claudio; Habich, Dirk; Lehner, WolfgangA variety of query optimization papers have shown the disastrous effect of poor cardinality estimates on the overall run time for arbitrary select-project-join (SPJ) queries.Especially, underestimating join cardinalities for multi-joins can lead to catastrophic join orderings. A promising solution to overcome this problem is query optimization based on upper bounds for the join cardinalities. In this domain, our proposed UES concept is presently the most efficient technique featuring a simple, yet effective upper bound for an arbitrary number of joins. To foster research in that direction, we introduce PostBOUND, our generalized framework making upper bound SPJ query optimization a first class citizen in PostgreSQL.PostBOUND provides abstractions to calculate arbitrary upper bounds, to model joins required by an SPJ query and to iteratively construct an optimized join order.To highlight the extensibility of PostBOUND and to show the research potential, we additionally present two tighter upper bound UES variants using top-k statistics in this paper.In our evaluation, we show the efficiency and applicability of PostBOUND on different workloads as well as using different PostgreSQL versions. Additionally, we evaluate both presented tighter upper bound variant ideas.
- ZeitschriftenartikelSeason- and Trend-aware Symbolic Approximation for Accurate and Efficient Time Series Matching(Datenbank-Spektrum: Vol. 21, No. 3, 2021) Kegel, Lars; Hartmann, Claudio; Thiele, Maik; Lehner, WolfgangProcessing and analyzing time series datasets have become a central issue in many domains requiring data management systems to support time series as a native data type. A core access primitive of time series is matching, which requires efficient algorithms on-top of appropriate representations like the symbolic aggregate approximation (SAX) representing the current state of the art. This technique reduces a time series to a low-dimensional space by segmenting it and discretizing each segment into a small symbolic alphabet. Unfortunately, SAX ignores the deterministic behavior of time series such as cyclical repeating patterns or a trend component affecting all segments, which may lead to a sub-optimal representation accuracy. We therefore introduce a novel season- and a trend-aware symbolic approximation and demonstrate an improved representation accuracy without increasing the memory footprint. Most importantly, our techniques also enable a more efficient time series matching by providing a match up to three orders of magnitude faster than SAX.