Auflistung nach Autor:in "Broneske, David"
1 - 10 von 13
Treffer pro Seite
Sortieroptionen
- Textdokument1st Workshop on Novel Data Management Ideas on Heterogeneous (Co-)Processors (NoDMC)(BTW 2019 – Workshopband, 2019) Broneske, David; Habich, Dirk
- TextdokumentThe Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolution(BTW 2019, 2019) Chen, Xiao; Campero Durand, Gabriel; Zoun, Roman; Broneske, David; Li, Yang; Saake, GunterRecently word embedding has become a beneficial technique for diverse natural language processing tasks, especially after the successful introduction of several popular neural word embedding models, such as word2vec, GloVe, and FastText. Also entity resolution, i.e., the task of identifying digital records that refer to the same real-world entity, has been shown to benefit from word embedding. However, the use of word embeddings does not lead to a one-size-fits-all solution, because it cannot provide an accurate result for those values without any semantic meaning, such as numerical values. In this paper, we propose to use the combination of general word embedding with traditional hand-picked similarity measures for solving ER tasks, which aims to select the most suitable similarity measure for each attribute based on its property. We provide some guidelines on how to choose suitable similarity measures for different types of attributes and evaluate our proposed hybrid method on both synthetic and real datasets. Experiments show that a hybrid method reliant on correctly selecting required similarity measures can outperform the method of purely adopting traditional or word-embedding-based similarity measures.
- JournalCooking DBMS Operations using Granular Primitives(Datenbank-Spektrum: Vol. 18, No. 3, 2018) Gurumurthy, Bala; Broneske, David; Drewes, Tobias; Pionteck, Thilo; Saake, Gunter
- ZeitschriftenartikelExploiting capabilities of modern processors in data intensive applications(it - Information Technology: Vol. 59, No. 5, 2017) Broneske, David; Saake, GunterIn main-memory database systems, the time to process the data has become a limiting factor due to the missing access gap. With changing processing capabilities (e.g., branch prediction, pipelining) in every new CPU architecture, code that was optimal once will probably not stay the best code forever. In this article, we analyze processing capabilities of the classical CPU and describe code optimizations to exploit the capabilities. Furthermore, we present state-of-the-art compiler techniques that already implement code optimizations, while also showing gaps for further code optimization integration.
- KonferenzbeitragExtending an index-benchmarking framework with non-invasive visualization capability(Datenbanksysteme für Business, Technologie und Web (BTW) 2013 - Workshopband, 2013) Broneske, David; Schäler, Martin; Grebhahn, AlexanderFinding a suitable multi-dimensional index structure for a data-intensive system is not a trivial task. The QuEval framework supports users in finding the best index structure from a list of candidates. Nevertheless, if an index structure shows itself superior to other index structures most oft the times, but fails for one data set, we want to know the reason for this phenomenon. To support an understanding of deficits, a visualization of the partitioning scheme is helpful. Consequently, we propose a visualization component which interacts with QuEval without affecting the performance evaluation. Thus, we use a modern software-engineering approach based on AspectJ to support Digital Engineering of complex solutions.
- ZeitschriftenartikelGridTables: A One-Size-Fits-Most H2TAP Data Store(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Pinnecke, Marcus; Campero Durand, Gabriel; Broneske, David; Zoun, Roman; Saake, GunterHeterogeneous Hybrid Transactional Analytical Processing ( $$\mathrm{H}^{2}$$ H 2 TAP) database systems have been developed to match the requirements for low latency analysis of real-time operational data. Due to technical challenges, these systems are hard to architect, non-trivial to engineer, and complex to administrate. Current research has proposed excellent solutions to many of those challenges in isolation – a unified engine enabling to optimize performance by combining these solutions is still missing. In this concept paper, we suggest a highly flexible and adaptive data structure (called gridtable ) to physically organize sparse but structured records in the context of $$\mathrm{H}^{2}$$ H 2 TAP. For this, we focus on the design of an efficient highly-flexible storage layout that is built from scratch for mixed query workloads. The key challenges we address are: (1) partial storage in different memory locations, and (2) the ability to optimize for mixed OLTP-/OLAP access patterns. To guarantee safe and well-specified data definition or manipulation, as well as fast querying with no compromises on performance, we propose two dedicated access paths to the storage. In this paper, we explore the architecture and internals of gridtables showing design goals, concepts and trade-offs. We close this paper with open research questions and challenges that must be addressed in order to take advantage of the flexibility of our solution.
- KonferenzbeitragHardware-Sensitive Scan Operator Variants for Compiled Selection Pipelines(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Broneske, David; Meister, Andreas; Saake, GunterThe ever-increasing demand for performance on huge data sets forces database systems to tweak the last bit of performance out of their operators. Especially query compiled plans allow for several tuning opportunities that can be applied depending on the query plan and the underlying data. Apart from classical query optimization opportunities, it includes to tune the code using code optimizations for processor specifics, e.g., using Single Instruction Multiple Data processing or predication. In this paper, we examine code optimizations that can be applied for compiled scan pipelines that include aggregations, evaluate impact factors that influence the performance of the scan pipelines, and derive guidelines that a query compiler should implement to choose the best variant for a given query plan and workload.
- ZeitschriftenartikelIn-Depth Analysis of OLAP Query Performance on Heterogeneous Hardware(Datenbank-Spektrum: Vol. 21, No. 2, 2021) Broneske, David; Drewes, Anna; Gurumurthy, Bala; Hajjar, Imad; Pionteck, Thilo; Saake, GunterClassical database systems are now facing the challenge of processing high-volume data feeds at unprecedented rates as efficiently as possible while also minimizing power consumption. Since CPU-only machines hit their limits, co-processors like GPUs and FPGAs are investigated by database system designers for their distinct capabilities. As a result, database systems over heterogeneous processing architectures are on the rise. In order to better understand their potentials and limitations, in-depth performance analyses are vital. This paper provides interesting performance data by benchmarking a portable operator set for column-based systems on CPU, GPU, and FPGA – all available processing devices within the same system. We consider TPC‑H query Q6 and additionally a hash join to profile the execution across the systems. We show that system memory access and/or buffer management remains the main bottleneck for device integration, and that architecture-specific execution engines and operators offer significantly higher performance.
- JournalIntegration of FPGAs in Database Management Systems: Challenges and Opportunities(Datenbank-Spektrum: Vol. 18, No. 3, 2018) Becher, Andreas; B.G., Lekshmi; Broneske, David; Drewes, Tobias; Gurumurthy, Bala; Meyer-Wegener, Klaus; Pionteck, Thilo; Saake, Gunter; Teich, Jürgen; Wildermann, Stefan
- TextdokumentMSDataStream – Connecting a Bruker Mass Spectrometer to the Internet(BTW 2019, 2019) Zoun, Roman; Schallert, Kay; Broneske, David; Fenske, Wolfram; Pinnecke, Marcus; Heyer, Robert; Brehmer, Sven; Benndorf, Dirk; Saake, GunterMetaproteomics is the biological research of proteins of whole communities comprised of thousands of species using tandem mass spectrometry. But still it follows a sequential non parallelizable workflow. Hence, researchers have to wait for hours or even days until the measurement data are available. In our demo, we show a way to decrease the smallest unit of the workflow to a minimum to realize a near real time stream processing system on a fast data architecture.