P214 - BTW2013 - Datenbanksysteme für Business, Technologie und Web
Autor*innen mit den meisten Dokumenten
Neueste Veröffentlichungen
- KonferenzbeitragLogical recovery from single-page failures(Datenbanksysteme für Business, Technologie und Web (BTW) 2021, 2013) Graefe, Goetz; Seeger, BernhardModern hardware technologies and ever-increasing data sizes increase probability and frequency of local storage failures, e.g., unrecoverable read errors on individual disk sectors or pages on flash storage. Our prior work has formalized singlepage failures and outlined efficient methods for their detection and recovery. These prior techniques rely on old backup copies of individual pages, e.g., as part of a database backup or as old versions retained after a page migration. Those might not be available, however, e.g., after recent index creation in “non-logged” or “allocation-only logging” mode, which industrial database products commonly use. The present paper introduces techniques for single-page recovery without backup copies, e.g., pages of new indexes created in allocation-only logging mode. By rederiving lost contents of individual pages, these techniques enable efficient recovery of data lost due to damaged storage structures or storage devices. Recovery performance depends on the size of the failure and of the required data sources; it is independent of the sizes of device, index structure, etc.
- KonferenzbeitragExtending the MPSM join(Datenbanksysteme für Business, Technologie und Web (BTW) 2018, 2013) Albutiu, Martina-Cezara; Kemper, Alfons; Neumann, ThomasHardware vendors are improving their (database) servers in two main aspects: (1) increasing main memory capacities of several TB per server, mostly with non-uniform memory access (NUMA) among sockets, and (2) massively parallel multi-core processing. While there has been research on the parallelization of database operations, still many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. Furthermore, NUMA has only recently caught the community's attention. In [AKN12], we analyzed the challenges that modern hardware poses to database algorithms on a 32-core machine with 1 TB of main memory (four NUMA partitions) and derived three rather simple rules for NUMA-affine scalable multi-core parallelization. Based on our findings, we developed MPSM, a suite of massively parallel sort-merge join algorithms, and showed its competitive performance on large main memory databases with billions of objects. In this paper, we go one step further and investigate the effectiveness of MPSM for non-inner join variants and complex query plans. We show that for noninner join variants, MPSM incurs no extra overhead. Further, we point out ways of exploiting the roughly sorted output of MPSM in subsequent joins. In our evaluation, we compare these ideas to the basic execution of sequential MPSM joins and find that the original MPSM performs very well in complex query plans.
- KonferenzbeitragResource description and selection for range query processing in general metric spaces(Datenbanksysteme für Business, Technologie und Web (BTW) 2020, 2013) Blank, Daniel; Henrich, AndreasSimilarity search in general metric spaces is a key aspect in many application fields. Metric space indexing provides a flexible indexing paradigm and is solely based on the use of a distance metric. No assumption is made about the representation of the database objects. Nowadays, ever-increasing data volumes require large-scale distributed retrieval architectures. Here, local and global indexing schemes are distinguished. In the local indexing approach, every resource administers a set of documents and indexes them locally. Resource descriptions providing the basis for resource selection can be disseminated to avoid all resources being contacted when answering a query. On the other hand, global indexing schemes are based on a single index which is distributed so that every resource is responsible for a certain part of the index. For local indexing, only few exact approaches have been proposed which support general metric space indexing. In this paper, we introduce RS4MI-an exact resource selection approach for general metric space indexing. We compare RS4MI with approaches presented in literature based on a peer-to-peer scenario when searching for similar images by image content. RS4MI can outperform two exact general metric space resource selection schemes in case of range queries. Fewer resources are contacted by RS4MI with-at the same time-more space efficient resource descriptions.
- KonferenzbeitragTaking the edge off cardinality estimation errors using incremental execution(Datenbanksysteme für Business, Technologie und Web (BTW) 2019, 2013) Neumann, Thomas; Galindo-Legaria, CesarQuery optimization is an essential ingredient for efficient query processing, as semantically equivalent execution alternatives can have vastly different runtime behavior. The query optimizer is largely driven by cardinality estimates when selecting execution alternatives. Unfortunately these estimates are largely inaccurate, in particular for complex predicates or skewed data. We present an incremental execution framework to make the query optimizer more resilient to cardinality estimation errors. The framework computes the sensitivity of execution plans relative to cardinality estimation errors, and if necessary executes parts of the query to remove uncertainty. This technique avoids optimization decisions based upon gross misestimation, and makes query optimization (and thus processing) much more robust. We demonstrate the effectiveness of these techniques on large real-world and synthetic data sets.
- Editiertes Buch
- KonferenzbeitragPeRA: individual privacy control in intelligent transportation systems(Datenbanksysteme für Business, Technologie und Web (BTW) 2053, 2013) Kost, Martin; Dzikowski, Raffael; Freytag, Johann-ChristophIn the domain of Intelligent Transportation Systems (ITS) manufacturers and service providers start to implement and deploy plenty of (new) applications running on a vehicle. These applications involve the user and external services. Therefore, we must incorporate mechanisms providing the individual for controlling his/her privacy. Existing approaches only consider to control the event of data access using a central instance. In contrast, we consider to implement individual privacy requirements for the complete data flow of distributed systems. The Privacy-enforcing Runtime Architecture (PeRA) provides a holistic privacy protection approach, which implements user-defined privacy policies. A data-centric protection chain ensures that ITS components process data according to attached privacy policies. PeRA instances constitute a distributed privacy middleware, which evaluates privacy policies to mediate data access by applications. The PeRA architecture includes an integrity protection layer to create a distributed policy enforcement perimeter between ITS nodes, which prevents the circumvention of policies. We implemented the PeRA architecture as a proof-of-concept prototype.
- KonferenzbeitragProQua: Ein Probabilistisches Datenbanksystem für die Auswertung von Ähnlichkeitsanfragen auf unsicheren Datengrundlagen(Datenbanksysteme für Business, Technologie und Web (BTW) 2052, 2013) Lehrack, Sebastian; Saretz, Sascha; Winkel, ChristianProQua ist ein neuartiges probabilistisches Datenbanksystem, welches die Auswertung von gewichteten logikbasierten Ähnlichkeitsbedingungen auf einer unsicheren Datenbasis zum Ziel hat. Die wesentlichen Leistungsmerkmale von ProQua werden anhand eines Bespielszenarios aus dem Umfeld der Archäologie präsentiert.
- KonferenzbeitragMR-DSJ: distance-based self-join for large-scale vector data analysis with mapreduce(Datenbanksysteme für Business, Technologie und Web (BTW) 2017, 2013) Seidl, Thomas; Fries, Sergej; Boden, BrigitteData analytics gets faced with huge and tremendously increasing amounts of data for which MapReduce provides a very convenient and effective distributed programming model. Various algorithms already support massive data analysis on computer clusters but, in particular, distance-based similarity self-joins lack efficient solutions for large vector data sets though they are fundamental in many data mining tasks including clustering, near-duplicate detection or outlier analysis. Our novel distance-based self-join algorithm for MapReduce, MR-DSJ, is based on grid partitioning and delivers correct, complete, and inherently duplicate-free results in a single iteration. Additionally we propose several filter techniques which reduce the runtime and communication of the MR-DSJ algorithm. Analytical and experimental evaluations demonstrate the superiority over other join algorithms for MapReduce.
- KonferenzbeitragA mutual pruning approach for RkNN join processing(Datenbanksysteme für Business, Technologie und Web (BTW) 2016, 2013) Emrich, Tobias; Kröger, Peer; Niedermayer, Johannes; Renz, Matthias; Züfle, AndreasA reverse k-nearest neighbour (RkNN) query determines the objects from a database that have the query as one of their k-nearest neighbors. Processing such a query has received plenty of attention in research. However, the effect of running multiple RkNN queries at once (join) or within a short time interval (bulk/group query) has, to the best of our knowledge, not been addressed so far. In this paper, we analyze RkNN joins and discuss possible solutions for solving this problem. During our performance analysis we provide evaluation results showing the IO and CPU performance of the compared algorithms for a variety of different setups.
- KonferenzbeitragDrillBeyond: open-world SQL queries using web tables(Datenbanksysteme für Business, Technologie und Web (BTW) 2050, 2013) Eberius, Julian; Thiele, Maik; Braunschweig, Katrin; Lehner, WolfgangThe Web consists of a huge number of documents, but also large amounts structured information, for example in the form of HTML tables containing relationalstyle data. One typical usage scenario for this kind of data is their integration into a database or data warehouse in order to apply data analytics. However, in today's business intelligence tools there is an evident lack of support for so-called situational or ad-hoc data integration. In this demonstration we will therefore present DrillBeyond, a novel database and information retrieval engine which allows users to query a local database as well as the web datasets in a seamless and integrated way with standard SQL. The audience will be able to pose queries to our DrillBeyond system which will be answered partly from local data in the database and partly from datasets that originate from the Web of Data. We will demonstrate the integration of the web tables back into the DBMS in order to apply its analytical features.