Auflistung nach Autor:in "Neumann, Thomas"
1 - 10 von 62
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragAlgebraic Query Optimization for Distributed Top-k Queries(Datenbanksysteme in Business, Technologie und Web (BTW 2007) – 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 2007) Neumann, Thomas; Michel, SebastianDistributed top-k query processing is increasingly becoming an essential functionality in a large number of emerging application classes. This paper addresses the efficient algebraic optimization of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We use a dynamic programming approach to find the optimal execution plan using compact data synopses for selectivity estimation that is the basis for our cost model. The optimized query is executed in a hierarchical way involving a small and fixed number of communication phases. We have performed experiments on real web data that show the benefits of distributed top-k query optimization both in network resource consumption and query response time.
- KonferenzbeitragBenchmarking hybrid OLTP&OLAP database systems(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Funke, Florian; Kemper, Alfons; Neumann, ThomasRecently, the case has been made for operational or real-time Business Intelligence (BI). As the traditional separation into OLTP database and OLAP data warehouse obviously incurs severe latency disadvantages for operational BI, hybrid OLTP&OLAP database systems are being developed. The advent of the first generation of such hybrid OLTP&OLAP database systems requires means to characterize their performance. While there are standardized and widely used benchmarks addressing either OLTP or OLAP workloads, the lack of a hybrid benchmark led us to the definition of a new mixed workload benchmark, called TPC-CH. This benchmark bridges the gap between the existing single-workload suits: TPC-C for OLTP and TPC-H for OLAP. The newly proposed TPC-CH benchmark executes a mixed workload: A transactional workload based on the order entry processing of TPC-C and a corresponding TPC-H-equivalent OLAP query suite on this sales data base. As it is derived from these two most widely used TPC benchmarks our new TPC-CH benchmark produces results that are highly comparable to both, hybrid systems and classic single-workload systems. Thus, we are able to compare the performance of our own (and other) hybrid database system running both OLTP and OLAP workloads in parallel with the OLTP performance of dedicated transactional systems (e.g., VoltDB) and the OLAP performance of specialized OLAP databases (e.g., column stores such as MonetDB).
- ZeitschriftenartikelBericht vom Herbsttreffen der GI-Fachgruppe Datenbanksysteme(Datenbank-Spektrum: Vol. 13, No. 1, 2013) Kemper, Alfons; Mühlbauer, Tobias; Neumann, Thomas; Reiser, Angelika; Rödiger, Wolf
- KonferenzbeitragBestimmung der semantischen Eigenschaften von Datenstromsystemen durch Black-Box-Tests(Datenbanksysteme für Business, Technologie und Web (BTW) 2013 - Workshopband, 2013) Lauterwald, Frank; Pollner, Niko; Meyer-Wegener, KlausDie Semantik von Datenstromsystemen (DSS) ist bislang nicht standardisiert. Für Anwendungsentwickler ist es jedoch wichtig zu wissen, wie sich ein bestimmtes System in einer bestimmten Situation verhält. Ebenso bedeutsam ist das Verhalten für föderierte Datenstromsysteme, die Anfragen automatisch auf verschiedene DSS verteilen. Als Hilfsmittel zur Beschreibung können semantische Modelle dienen. Diese werden parametrisiert und können durch verschiedene Parameterwerte das Verhalten verschiedener Systeme nachbilden. Da bisher auch kein allgemein anerkanntes Modell zur Beschreibung von DSS existiert, muss man sich möglicherweise mit verschiedenen Modellen auseinandersetzen. Daher wäre es hilfreich, die Bestimmung der jeweiligen Parameterwerte weitgehend zu automatisieren, wozu dieser Beitrag eine geeignete Evaluationsumgebung vorstellt. Diese vergleicht die Ausgaben eines DSS mit allen Vorhersagen, die ein Modell für verschiedene Parameter machen kann. Stimmen die Ergebnisse überein, sind die Parameter gefunden. Erfahrungen damit und Beschränkungen dieses Ansatzes werden diskutiert.
- TextdokumentB²-Tree(BTW 2021, 2021) Schmeißer, Josef; Schüle, Maximilian E.; Leis, Viktor; Neumann, Thomas; Kemper, AlfonsRecently proposed index structures, that combine trie-based and comparison-based search mechanisms, considerably improve retrieval throughput for in-memory database systems. However, most of these index structures allocate small memory chunks when required. This stands in contrast to block-based index structures, that are necessary for disk-accesses of beyond main-memory database systems such as Umbra. We therefore present the B²-tree. The outer structure is identical to that of an ordinary B+-tree. It still stores elements in a dense array in sorted order, enabling efficient range scan operations. However, B²-tree is composed of multiple trees, each page integrates another trie-based search tree, which is used to determine a small memory region where a sought entry may be found. An embedded tree thereby consists of decision nodes, which operate on a single byte at a time, and span nodes, which are used to store common prefixes. This architecture usually accesses fewer cache lines than a vanilla B+-tree as our performance evaluation proved. As a result, the B²-tree is able to answer point queries considerably faster.
- KonferenzbeitragCDIM - call for papers(Datenbanksysteme für Business, Technologie und Web (BTW) 2013 - Workshopband, 2013) Nürnberger, Andreas; Balke, Wolf-TiloThe first CDIM workshop on crowd-enabled data and information management was held in conjunction with 15th GI-Fachtagung Datenbanksysteme für Business, Technologie und Web (BTW), Magdeburg, Germany, 2013.
- KonferenzbeitragCommunication-Optimal Parallel Reservoir Sampling(BTW 2023, 2023) Winter, Christian; Sichert, Moritz; Birler, Altan; Neumann, Thomas; Kemper, AlfonsWhen evaluating complex analytical queries on high-velocity data streams, many systems cannot run those queries on all elements of a stream. Sampling is a widely used method to reduce the system load by replacing the input with a representative yet manageable subset. For unbounded data, reservoir sampling generates a fixed-size uniform sample independent of the input cardinality. However, the collection of reservoir samples itself can already be a bottleneck for high-velocity data.In this paper, we introduce a technique that allows fully parallelizing reservoir sampling for many-core architectures. Our approach relies on the efficient combination of thread-local samples taken over chunks of the input without necessitating communication during the sampling phase and with minimal communication when merging. We show how our efficient merge guarantees uniform random samples while allowing data to be distributed over worker threads arbitrarily. Our analysis of this approach within the Umbra database system demonstrates linear scaling along the available threads and the ability to sustain high-velocity workloads.
- KonferenzbeitragThe Complete Story of Joins (inHyPer)(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Neumann, Thomas; Leis, Viktor; Kemper, AlfonsSQL has evolved into an (almost) fully orthogonal query language that allows (arbitrarily deeply) nested subqueries in nearly all parts of the query. In order to avoid recursive evaluation strategies which incur unbearable O(n2) runtime we need an extended relational algebra to translate such subqueries into non-standard join operators. This paper concentrates on the non-standard join operators beyond the classical textbook inner joins, outer joins and (anti) semi joins. Their implementations in HyPer were covered in previous publications which we refer to. In this paper we cover the new join operators mark-join and single-join at both levels: At the logical level we show the translation and reordering possibilities in order to e ectively optimize the resulting query plans. At the physical level we describe hash-based and block-nested loop implementations of these new joins. Based on our database system HyPer, we describe a blue print for the complete query translation and optimization pipeline. The practical need for the advanced join operators is proven by an analysis of the two well known TPC-H and TPC-DS benchmarks which revealed that all variants are actually used in these query sets.
- KonferenzbeitragConcept for a web based support of the development process(Datenbanksysteme für Business, Technologie und Web (BTW) 2013 - Workshopband, 2013) Oellrich, Marc; Mantwill, FrankDuring the last years there have been some developments in the internet which might support the product development process. Some ideas at the beginning of the millennium have shown that web based systems can raise the efficiency, but the possibilities are nowadays much higher. While at that time representations have been only in a static state, they can now be handled much more user-friendly and get accepted like shown with Wikipedia or Facebook. Interesting further opportunities are given by Open Innovation, where problems are solved by a big amount of online users. This concept will show principal components of an integrated web based system, which supports the development methodological approach, reduces the workload to collect and enter redundant data, allows collaborative and partially asynchronous cooperation and will contribute to determine and map the knowledge and experience of the employees, what should lead to a higher ability\&nbError: Illegal entry in bfrange block in ToUnicode CMap sp;to compete and gives a clear competitive advantage.
- KonferenzbeitragD-VITA: A visual interactive text analysis system using dynamic topic mining(Datenbanksysteme für Business, Technologie und Web (BTW) 2013 - Workshopband, 2013) Günnemann, NikouRecent developments in web technologies like Web 2.0 have led to the generation of massive amounts of data. The rapid growth of data makes knowledge extraction and trend prediction a challenging task. A recent approach for the unsupervised analysis of text corpora is dynamic topic mining. While there is a growing interest in using this technique, interactive analysis systems for dynamic topic mining are still in an early stage. In this paper we present D-VITA, an interactive text analysis system that exploits dynamic topic mining to detect the latent topic structure and topic dynamics in a collection of documents. D-VITA supports end-users in understanding and exploiting the topic mining results, in visualizing the topic dynamics within document collections, and in browsing of documents based on shared topics. We present an application case for a scientific community that uses an instance of D-VITA for trend analysis in their data sources.