P241 - BTW2015 - Datenbanksysteme für Business, Technologie und Web
Autor*innen mit den meisten Dokumenten
Neueste Veröffentlichungen
- KonferenzbeitragD2Pt: Privacy-Aware Multiparty Data Publication(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Nielsen, Jan Hendrik; Janusz, Daniel; Taeschner, Jochen; Freytag, Johann-ChristophToday, publication of medical data faces high legal barriers. On the one hand, publishing medical data is important for medical research. On the other hand, it is neccessary to protect peoples' privacy by ensuring that the relationship between individuals and their related medical data remains unknown to third parties. Various data anonymization techniques remove as little identifying information as possible to maintain a high data utility while satisfying the strict demands of privacy laws. Current research in this area proposes a multitude of concepts for data anonymization. The concept of k-anonymity allows data publication by hiding identifying information without losing its semantics. Based on k-anonymity, the concept of t-closeness incorporates semantic relationships between personal data values, therefore increasing the strength of the anonymization. However, these concepts are restricted to a centralized data source. In this paper, we extend existing data privacy mechanisms to enable joint data publication among multiple participating institutions. In particular, we adapt the concept of t-closeness for distributed data anonymization. We introduce Distributed two-Party t-closeness (D2Pt), a protocol that utilizes cryptographic algorithms to avoid a central component when anonymizing data adhering the t-closeness property. That is, without a trusted third party, we achieve a data privacy based on the notion of t-closeness.
- KonferenzbeitragPrivacy preserving record linkage with ppjoin(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Sehili, Ziad; Kolb, Lars; Borgs, Christian; Schnell, Rainer; Rahm, ErhardPrivacy-preserving record linkage (PPRL) becomes increasingly important to match and integrate records with sensitive data. PPRL not only has to preserve the anonymity of the persons or entities involved but should also be highly efficient and scalable to large datasets. We therefore investigate how to adapt PPJoin, one of the fastest approaches for regular record linkage, to PPRL resulting in a new approach called P4Join. The use of bit vectors for PPRL also allows us to devise a parallel execution of P4Join on GPUs. We evaluate the new approaches and compare their efficiency with a PPRL approach based on multibit trees.
- KonferenzbeitragTowards Automated Polyglot Persistence(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Schaarschmidt, Michael; Gessert, Felix; Ritter, NorbertIn this paper, we present an innovative solution for providing automated polyglot persistence based on service level agreements defined over functional and non-functional requirements of database systems. Complex applications require polyglot persistence to deal with a wide range of database related needs. Until now, the overhead and the required know-how to manage multiple database systems prevents many applications from employing efficient polyglot persistence solutions. Instead, developers are often forced to implement one-size-fits-all solutions that do not scale well and cannot easily be upgraded. Therefore, we introduce the concept for a Polyglot Persistence Mediator (PPM), which allows for runtime decisions on routing data to different backends according to schema-based annotations. This enables applications to either employ polyglot persistence right from the beginning or employ new systems at any point with minimal overhead. We have implemented and evaluated the concept of automated polyglot persistence for a REST-based Database-as-a-Service setting. Evaluations were performed on various EC2 setups, showing a scalable writeperformance increase of 50-100\% for a typical polyglot persistence scenario as well as drastically reduced latencies for reads and queries.
- KonferenzbeitragThe cache sketch: revisiting expiration-based caching in the age of cloud data management(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Gessert, Felix; Schaarschmidt, Michael; Wingerath, Wolfram; Friedrich, Steffen; Ritter, NorbertThe expiration-based caching model of the web is generally considered irreconcilable with the dynamic workloads of cloud database services, where expiration dates are not known in advance. In this paper, we present the Cache Sketch data structure which makes expiration-based caching of database records feasible with rich tunable consistency guarantees. The Cache Sketch enables database services to leverage the large existing caching infrastructure of content delivery networks, browser caches and web caches to provide low latency and high scalability. The Cache Sketch employs Bloom filters to create compact representations of potentially stale records to transfer the task of cache coherence to clients. Furthermore, it also minimizes the number of invalidations the service has to perform on caches that support them (e.g., CDNs). With different age-control policies the Cache Sketch achieves very high cache hit ratios with arbitrarily low stale read probabilities. We present the Constrained Adaptive TTL Es- timator to provide cache expiration dates that optimize the performance of the Cache Sketch and invalidations. To quantify the performance gains and to derive workloadoptimal Cache Sketch parameters, we introduce the YCSB Monte-Carlo Caching Simulator (YMCA), a generic framework for simulating the performance and consistency characteristics of any caching and replication topology. We also provide empirical evidence for the efficiency of the Cache Sketch construction and the real-world latency reductions of database workloads under CDN-caching.
- Editiertes Buch
- KonferenzbeitragSQL-Grundlagen spielend lernen mit dem Text-Adventure SQL Island(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Schildgen, Johannes; Deßloch, StefanWir präsentieren SQL Island, ein neuartiges browserbasiertes Lernspiel, welches auf dem Konzept der Text-Adventures basiert. Nach einem Flugzeugabsturz landet die Spielfigur auf einer Insel. Man redet mit Bewohnern, sammelt Gegenstände und käpft gegen Bösewichte. Die Besonderheit bei diesem Spiel ist jedoch, dass der Spieler seine Figur lediglich mittels SQL-Befehlen steuert. Alle nötigen Befehle werden zunächst präsentiert, sodass keine Vorerfahrung notwendig ist. Nach etwa einer Stunde Spielzeit beherrscht der Spieler SELECT-, UPDATE- und DELETE-Anfragen sowie Gruppierungen, Aggregationsfunktionen und Joins. Das Spiel kann online auf
- KonferenzbeitragSequential pattern mining of multimodal streams in the humanities(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Hassani, Marwan; Beecks, Christian; Töws, Daniel; Serbina, Tatiana; Haberstroh, Max; Niemietz, Paula; Jeschke, Sabina; Neumann, Stella; Seidl, ThomasResearch in the humanities is increasingly attracted by data mining and data management techniques in order to efficiently deal with complex scientific corpora. Particularly, the exploration of hidden patterns within different types of data streams arising from psycholinguistic experiments is of growing interest in the area of translation process research. In order to support psycholinguistic experts in quantitatively discovering the non-self-explanatory behavior of the data, we propose the e-cosmos miner framework for mining, generating and visualizing sequential patterns hidden within multimodal streaming data. The introduced MSS-BE algorithm, based on the PrefixSpan method, searches for sequential patterns within multiple streaming inputs arriving from eye tracking and keystroke logging data recorded during translation tasks. The e-cosmos miner enables psycholinguistic experts to select different sequential patterns as they appear in the translation process, compare the evolving changes of their statistics during the process and track their occurrences within a special simulator.
- KonferenzbeitragVisualizing the behavior of an elastic, energy-efficient database cluster(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Ganza, Sandy; Psota, Thomas; Schall, Daniel; Härder, TheoEnergy efficiency in databases is an emerging topic. Our research prototype WattDB dynamically adjusts the number of active servers in a cluster to the current workload to achieve energy proportionality. In this demo, we give insights in the partitioning process and WattDB's reaction to workload changes by live-presenting a monitoring GUI. The whole process and the resulting configuration are visualized to give immediate feedback, how the cluster would react.
- KonferenzbeitragOnline bit flip detection for in-memory B-trees live!(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Kolditz, Till; Schlegel, Benjamin; Habich, Dirk; Lehner, WolfgangHardware vendors constantly decrease the feature sizes of integrated circuits to obtain higher performance and energy efficiency. As a side-effect, integrated circuits - like CPUs and main memory - become more and more vulnerable to external influences and thus unreliable, which results in increasing numbers of (multi-) bit flips. From a database perspective bit flip errors in main memory will become a major challenge for modern in-memory database systems, which keep all their enterprise data in volatile, unreliable main memory. Existing hardware error control techniques like ECC-DRAM are able to detect and correct memory errors, but their detection and correction capabilities are limited and come along with several downsides. To underline this we heat up RAM live on-site to show possible error rates of future hardware. We previously presented various techniques for the B-Tree - as a wide-spread index structure - for online error detection and thus increase its overall reliability [Kea14b]. We also show live performance comparisons in terms of throughput and error detection rates between several bit flip detecting B-Tree variants. By that, we demonstrate the tradeoff between detection accuracy and index throughput. Furthermore, we show annotated structural information about the trees like corrupted nodes and inaccessible sub-trees.
- KonferenzbeitragRecoleta: A recommender system for events for personalised E-mail campaigns(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Eichinger, Frank; Wietreich, ImmanuelWe demonstrate the RecoLeta system for event recommendations. It combines two different recommender approaches: one novel approach dedicated to music concert events and one state-of-the-art approach. We also present our big-data architecture for e-mail delivery and recommendation calculation in an in-memory database.