Logo des Repositoriums

P241 - BTW2015 - Datenbanksysteme für Business, Technologie und Web

Autor*innen mit den meisten Dokumenten  

Auflistung nach:

Neueste Veröffentlichungen

1 - 10 von 53
  • Konferenzbeitrag
    Privacy preserving record linkage with ppjoin
    (Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Sehili, Ziad; Kolb, Lars; Borgs, Christian; Schnell, Rainer; Rahm, Erhard
    Privacy-preserving record linkage (PPRL) becomes increasingly important to match and integrate records with sensitive data. PPRL not only has to preserve the anonymity of the persons or entities involved but should also be highly efficient and scalable to large datasets. We therefore investigate how to adapt PPJoin, one of the fastest approaches for regular record linkage, to PPRL resulting in a new approach called P4Join. The use of bit vectors for PPRL also allows us to devise a parallel execution of P4Join on GPUs. We evaluate the new approaches and compare their efficiency with a PPRL approach based on multibit trees.
  • Konferenzbeitrag
    D2Pt: Privacy-Aware Multiparty Data Publication
    (Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Nielsen, Jan Hendrik; Janusz, Daniel; Taeschner, Jochen; Freytag, Johann-Christoph
    Today, publication of medical data faces high legal barriers. On the one hand, publishing medical data is important for medical research. On the other hand, it is neccessary to protect peoples' privacy by ensuring that the relationship between individuals and their related medical data remains unknown to third parties. Various data anonymization techniques remove as little identifying information as possible to maintain a high data utility while satisfying the strict demands of privacy laws. Current research in this area proposes a multitude of concepts for data anonymization. The concept of k-anonymity allows data publication by hiding identifying information without losing its semantics. Based on k-anonymity, the concept of t-closeness incorporates semantic relationships between personal data values, therefore increasing the strength of the anonymization. However, these concepts are restricted to a centralized data source. In this paper, we extend existing data privacy mechanisms to enable joint data publication among multiple participating institutions. In particular, we adapt the concept of t-closeness for distributed data anonymization. We introduce Distributed two-Party t-closeness (D2Pt), a protocol that utilizes cryptographic algorithms to avoid a central component when anonymizing data adhering the t-closeness property. That is, without a trusted third party, we achieve a data privacy based on the notion of t-closeness.
  • Konferenzbeitrag
    The cache sketch: revisiting expiration-based caching in the age of cloud data management
    (Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Gessert, Felix; Schaarschmidt, Michael; Wingerath, Wolfram; Friedrich, Steffen; Ritter, Norbert
    The expiration-based caching model of the web is generally considered irreconcilable with the dynamic workloads of cloud database services, where expiration dates are not known in advance. In this paper, we present the Cache Sketch data structure which makes expiration-based caching of database records feasible with rich tunable consistency guarantees. The Cache Sketch enables database services to leverage the large existing caching infrastructure of content delivery networks, browser caches and web caches to provide low latency and high scalability. The Cache Sketch employs Bloom filters to create compact representations of potentially stale records to transfer the task of cache coherence to clients. Furthermore, it also minimizes the number of invalidations the service has to perform on caches that support them (e.g., CDNs). With different age-control policies the Cache Sketch achieves very high cache hit ratios with arbitrarily low stale read probabilities. We present the Constrained Adaptive TTL Es- timator to provide cache expiration dates that optimize the performance of the Cache Sketch and invalidations. To quantify the performance gains and to derive workloadoptimal Cache Sketch parameters, we introduce the YCSB Monte-Carlo Caching Simulator (YMCA), a generic framework for simulating the performance and consistency characteristics of any caching and replication topology. We also provide empirical evidence for the efficiency of the Cache Sketch construction and the real-world latency reductions of database workloads under CDN-caching.
  • Konferenzbeitrag
    Towards Automated Polyglot Persistence
    (Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Schaarschmidt, Michael; Gessert, Felix; Ritter, Norbert
    In this paper, we present an innovative solution for providing automated polyglot persistence based on service level agreements defined over functional and non-functional requirements of database systems. Complex applications require polyglot persistence to deal with a wide range of database related needs. Until now, the overhead and the required know-how to manage multiple database systems prevents many applications from employing efficient polyglot persistence solutions. Instead, developers are often forced to implement one-size-fits-all solutions that do not scale well and cannot easily be upgraded. Therefore, we introduce the concept for a Polyglot Persistence Mediator (PPM), which allows for runtime decisions on routing data to different backends according to schema-based annotations. This enables applications to either employ polyglot persistence right from the beginning or employ new systems at any point with minimal overhead. We have implemented and evaluated the concept of automated polyglot persistence for a REST-based Database-as-a-Service setting. Evaluations were performed on various EC2 setups, showing a scalable writeperformance increase of 50-100\% for a typical polyglot persistence scenario as well as drastically reduced latencies for reads and queries.
  • Konferenzbeitrag
    Sequential pattern mining of multimodal streams in the humanities
    (Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Hassani, Marwan; Beecks, Christian; Töws, Daniel; Serbina, Tatiana; Haberstroh, Max; Niemietz, Paula; Jeschke, Sabina; Neumann, Stella; Seidl, Thomas
    Research in the humanities is increasingly attracted by data mining and data management techniques in order to efficiently deal with complex scientific corpora. Particularly, the exploration of hidden patterns within different types of data streams arising from psycholinguistic experiments is of growing interest in the area of translation process research. In order to support psycholinguistic experts in quantitatively discovering the non-self-explanatory behavior of the data, we propose the e-cosmos miner framework for mining, generating and visualizing sequential patterns hidden within multimodal streaming data. The introduced MSS-BE algorithm, based on the PrefixSpan method, searches for sequential patterns within multiple streaming inputs arriving from eye tracking and keystroke logging data recorded during translation tasks. The e-cosmos miner enables psycholinguistic experts to select different sequential patterns as they appear in the translation process, compare the evolving changes of their statistics during the process and track their occurrences within a special simulator.
  • Konferenzbeitrag
    SQL-Grundlagen spielend lernen mit dem Text-Adventure SQL Island
    (Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Schildgen, Johannes; Deßloch, Stefan
    Wir präsentieren SQL Island, ein neuartiges browserbasiertes Lernspiel, welches auf dem Konzept der Text-Adventures basiert. Nach einem Flugzeugabsturz landet die Spielfigur auf einer Insel. Man redet mit Bewohnern, sammelt Gegenstände und käpft gegen Bösewichte. Die Besonderheit bei diesem Spiel ist jedoch, dass der Spieler seine Figur lediglich mittels SQL-Befehlen steuert. Alle nötigen Befehle werden zunächst präsentiert, sodass keine Vorerfahrung notwendig ist. Nach etwa einer Stunde Spielzeit beherrscht der Spieler SELECT-, UPDATE- und DELETE-Anfragen sowie Gruppierungen, Aggregationsfunktionen und Joins. Das Spiel kann online auf
  • Konferenzbeitrag
    Visualizing the behavior of an elastic, energy-efficient database cluster
    (Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Ganza, Sandy; Psota, Thomas; Schall, Daniel; Härder, Theo
    Energy efficiency in databases is an emerging topic. Our research prototype WattDB dynamically adjusts the number of active servers in a cluster to the current workload to achieve energy proportionality. In this demo, we give insights in the partitioning process and WattDB's reaction to workload changes by live-presenting a monitoring GUI. The whole process and the resulting configuration are visualized to give immediate feedback, how the cluster would react.
  • Konferenzbeitrag
    KitMig - Flexible Live-Migration in mandantenfähigen Datenbanksystemen
    (Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Göbel, Andreas; Sufryd, Marcel
    Mandantenfähige Datenbanksysteme ermöglichen die gemeinsame Nutzung physischer Ressourcen durch eine Vielzahl von Mandanten. Ihr Einsatz erlaubt Anbietern von Cloud-Datenbankdiensten die Reduzierung der Betriebskosten durch eine hohe Ressourcennutzung und die Ausnutzung von Skaleneffekten. Die Migration von Mandanten innerhalb einer Serverfarm erweist sich in mandantenfähigen Datenbanksystemen als eine Schlüsselkomponente für Elastizität, Lastverteilung und Wartbarkeit. In diesen Einsatzbereichen werden jedoch unterschiedliche und zum Teil unvereinbare Anforderungen an eine Migration gestellt. Existierende Ansätze zur Live- Migration eignen sich aufgrund ihres statischen Ablaufs nur in wenigen Fällen. In diesem Beitrag stellen wir das Framework KitMig zur Live-Migration in mandantenfähigen Datenbanksystemen vor. In Anlehnung an einen Baukasten stellt es verschiedene Module zur Bestimmung des Migrationsablaufs bereit. Die geeignete Kombination von Modulen erlaubt die Anpassung des Ablaufs an die gestellten An- forderungen. Im Rahmen des Beitrags werden die KitMig-Phasen, zugehörige Module und die Implementierung im Open-Source-DBMS H21 beschrieben. Mehrere Untersuchungen demonstrieren die Charakteristik verschiedener Modulkombinationen und die Anwendbarkeit der resultierenden Abläufe in ausgewählten Einsatzbereichen.
  • Konferenzbeitrag
    MV-IDX: Multi-Version Index in Action
    (Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Gottstein, Robert; Goyal, Rohit; Petrov, Ilia; Hardock, Sergey; Buchmann, Alejandro
    Multi-Versioning DBMS (MV-DBMS) represent a very good match to the properties of Flash storage and the combination of both offers conceptual advantages. Yet, the specifics of indexing in MV-DBMS on Flash have been widely neglected. Although an index in a MV-DBMS references multiple versions of a data item, it is only allowed to return a single (at most one) version of that data item ”visible” to the current index operation. Logically separating version visibility checks from the index structure and operations, as in the traditional version-oblivious index, leads to version management overhead: to determine the appropriate version of a data item, the MV-DBMS first fetches all versions that match the search criteria and subsequently discards invisible versions according to the visibility criteria. This involves unnecessary I/Os to fetch tuple versions that do not need to be checked. We propose the idea that version-aware indexing has additional responsibility to recognize different tuple versions of a single data item and to filter invisible tuple versions in order to avoid unnecessary I/Os. In this work we demonstrate an approach called Multi-Version In- dex (MV-IDX) that allows index-only visibility checks which significantly reduce the amount of I/O as well as the index maintenance overhead. MV-IDX is implemented in the PostgreSQL open source MV-DBMS. We demonstrate that the MV-IDX achieves significantly lower response times and higher transactional throughput on OLTP workloads than the version-oblivious approach. We showcase latency and throughput improvements by utilizing the DBT2 TPC-C benchmarking tool and report saved I/Os. We also showcase how the proposed approach performs better on SSDs.