P265 - BTW2017 - Datenbanksysteme für Business, Technologie und Web

https://dl.gi.de/handle/20.500.12116/21090

Auflistung nach:

1 - 10 von 56

Konferenzbeitrag
Anfrage-getriebener Wissenstransfer zur Unterstützung von Datenanalysten
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Wahl, Andreas M.; Endler, Gregor; Schwab, Peter K.; Herbst, Sebastian; Lenz, Richard
In größeren Organisationen arbeiten verschiedene Gruppen von Datenanalysten mit unterschiedlichen Datenquellen, um analytische Fragestellungen zu beantworten. Das Formulieren effektiver analytischer Anfragen setzt voraus, dass die Datenanalysten profundes Wissen über die Existenz, Semantik und Verwendungskontexte relevanter Datenquellen besitzen. Derartiges Wissen wird informell innerhalb einzelner Gruppen von Datenanalysten geteilt, jedoch meist nicht in formalisierter Form für andere verfügbar gemacht. Mögliche Synergien bleiben somit ungenutzt. Wir stellen einen neuartigen Ansatz vor, der existierende Datenmanagementsysteme mit zusätzlichen Fähigkeiten für diesen Wissenstransfer erweitert. Unser Ansatz fördert die Kollaboration zwischen Datenanalysten, ohne dabei etablierte Analyseprozesse zu stören. Im Gegensatz zu bisherigen Forschungsansätzen werden die Analysten beim Transfer des in analytischen Anfragen enthaltenen Wissens unterstützt. Relevantes Wissen wird aus dem Anfrageprotokoll extrahiert, um das Auffinden von Datenquellen und die inkrementelle Datenintegration zu erleichtern. Extrahiertes Wissen wird formalisiert und zum Anfragezeitpunkt bereitgestellt.
Konferenzbeitrag
Autonomous Data Ingestion Tuning in Data Warehouse Accelerators
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Stolze, Knut; Beier, Felix; Müller, Jens
The IBM DB2 Analytics Accelerator (IDAA) is a state-of-the art hybrid database system that seamlessly extends the strong transactional capabilities of DB2 for z/OS with very fast processing of OLAP and analytical SQL workload in Netezza. IDAA copies the data from DB2 for z/OS into its Netezza backend, and customers can tailor data maintenance according to their needs. This copy process, the data load, can be done on a whole table or just a physical table partition. IDAA also o ers an incremental update feature, which employs replication technologies for low-latency data synchronization. The accelerator targets big relational databases with several TBs of data. Therefore, the data load is performance-critical, not only for the data transfer itself, but the system has to be able to scale up to a large number of tables, i. e., tens of thousands to be loaded at the same time, as well. The administrative overhead for such a number of tables has to be minimized. In this paper, we present our work on a prototype, which is geared towards e ciently loading data for many tables, where each table may store only a comparably small amount of data. A new load scheduler has been introduced for handling all concurrent load requests for disjoint sets of tables. That is not only required for a multi-tenant setup, but also a significant improvement for attaching an accelerator to a single DB2 for z/OS system. In this paper, we present architecture and implementation aspects of the new and improved load mechanism and results of some initial performance evaluations.
Konferenzbeitrag
Benchmarking Univariate Time Series Classifiers
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Schäfer, Patrick; Leser, Ulf
Time series are a collection of values sequentially recorded over time. Nowadays, sensors for recording time series are omnipresent as RFID chips, wearables, smart homes, or event-based systems. Time series classification aims at predicting a class label for a time series whose label is unknown. Therefore, a classifier has to train a model using labeled samples. Classification time is a key challenge given new applications like event-based monitoring, real-time decision or streaming systems. This paper is the first benchmark that compares 12 state of the art time series classifiers based on prediction and classification times. We observed that most of the state-of-the-art classifiers require extensive train and classification times, and might not be applicable for these new applications.
Konferenzbeitrag
Big Data is no longer equivalent to Hadoop in the industry
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Tönne, Andreas
For a long time, industry projects solved big data problems with Hadoop. The massive scalability of MapReduce algorithms and the HBase database brought solutions to an unanticipated level of computing. But this obstructs the view for the need of change. Business goals that emerge from Industry 4.0 or IoT have long been addressed with a suboptimal architecture. New business goals require a rethinking of the big data architecture instead of being driven by the known Hadoop ecosphere. We discuss the transformation of a Hadoop-centric middleware solution to a streaming architecture from a business value perspective. The new architecture also replaces a single NoSQL database by polyglot persistence that allows to focus on best performance and quality of each data processing step. We also discuss alternative architecture approaches like Lambda that were evaluated in the course of the transformation. We show that a single technology choice likely leads to a solution that is suboptimal.
Konferenzbeitrag
The Big Picture: Understanding large-scale graphs using Graph Grouping with GRADOOP
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Junghanns, Martin; Petermann, André; Teichmann, Niklas; Rahm, Erhard
Graph grouping supports data analysts in decision making based on the characteristics of large-scale, heterogeneous networks containing millions or even billions of vertices and edges. We demonstrate graph grouping with G , a scalable system supporting declarative programs composed from multiple graph operations. Using social network data, we highlight the analytical capabilities enabled by graph grouping in combination with other graph operators. The resulting graphs are visualized and visitors are invited to either modify existing or write new analytical programs. G is implemented on top of Apache Flink, a state-of-the-art distributed dataflow framework, and thus allows us to scale graph analytical programs across multiple machines. In the demonstration, programs can either be executed locally or remotely on our research cluster.
Konferenzbeitrag
Bosch IoT Cloud – Platform for the Internet of Things
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Binz, Tobias
Until 2020 all electronic products of Bosch should be IoT-enabled. This IoT and digital transformation is an enormous opportunity for Bosch, addressing the fields of connected mobility, connected industry (Industry 4.0), connected buildings and smart home. This overarching connectivity strategy is enabled by the Bosch IoT Cloud, a cloud platform specialized on developing, testing, and running scalable IoT services and applications.
Konferenzbeitrag
Bring Your Language to Your Data with EXASOL
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Mandl, Stefan; Kozachuk, Oleksandr; Graupmann, Jens
User Defined Functions (UDF) are an important feature of analytical SQL as they allow processing of data right inside of relational queries. Typically UDFs have to be written in a special language which sometimes diminishes their practical use, especially when required libraries are not available in this language. An alternative approach is to allow users to provide functions as native low-level database extensions; an approach that can be very dangerous. EXASOL now follows a radically different approach by allowing to integrate any programming language with the database without affecting data integrity: UDF language implementations are encapsulated within Linux containers that communicate with the database engine via a straightforward protocol. Using these technologies, users can now make available their own programming language for UDFs in the database. As an example, we show how to provide C++ as a UDF language for EXASOL.
Konferenzbeitrag
The Complete Story of Joins (inHyPer)
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Neumann, Thomas; Leis, Viktor; Kemper, Alfons
SQL has evolved into an (almost) fully orthogonal query language that allows (arbitrarily deeply) nested subqueries in nearly all parts of the query. In order to avoid recursive evaluation strategies which incur unbearable O(n2) runtime we need an extended relational algebra to translate such subqueries into non-standard join operators. This paper concentrates on the non-standard join operators beyond the classical textbook inner joins, outer joins and (anti) semi joins. Their implementations in HyPer were covered in previous publications which we refer to. In this paper we cover the new join operators mark-join and single-join at both levels: At the logical level we show the translation and reordering possibilities in order to e ectively optimize the resulting query plans. At the physical level we describe hash-based and block-nested loop implementations of these new joins. Based on our database system HyPer, we describe a blue print for the complete query translation and optimization pipeline. The practical need for the advanced join operators is proven by an analysis of the two well known TPC-H and TPC-DS benchmarks which revealed that all variants are actually used in these query sets.
Konferenzbeitrag
Confidentiality à la Carte with Cipherbase
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Kossmann, Donald
Organizations move data and workloads to the cloud because the cloud is cheaper, more agile, and more secure. Unfortunately, the cloud is not perfect and there are some fundamental tradeoffs that need to be made in the cloud. The Cipherbase project studies the tradeoffs between confidentiality and functionality that arise when state-of-the-art cryptography is combined with databases in the cloud: The more operations that are supported on encrypted data, the more information that can be leaked unintentionally. There has been a great deal of work studying these tradeoffs in the specific context of property-preserving encryption techniques. For instance, deterministic encryption can support equality predicates directly over encrypted data, but it is also vulnerable to inference attacks. This talk discusses the tradeoffs that arise in a more general context when trusted computing platforms such as FPGAs or Intel SGX technology are used to process encrypted data.
Konferenzbeitrag
ControVol Flex: Flexible Schema Evolution for NoSQL Application Development
(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Haubold, Florian; Schildgen, Johannes; Scherzinger, Stefanie; Deßloch, Stefan
We demonstrate ControVol Flex, an Eclipse plugin for controlled schema evolution in Java applications backed by NoSQL document stores. The sweet spot of our tool are applications that are deployed continuously against the same production data store: Each new release may bring about schema changes that conflict with legacy data already stored in production. The type system internal to the predecessor tool ControVol is able to detect common schema conflicts, and enables developers to resolve them with the help of object-mapper annotations. Our new tool ControVol Flex lets developers choose their schema-migration strategy, whether all legacy data is to be migrated eagerly by means of NotaQL transformation scripts, or lazily, as declared by object-mapper annotations. Our tool is even capable of carrying out both strategies in combination, eagerly migrating data in the background, while lazily migrating data that is meanwhile accessed by the application. From the viewpoint of the application, it remains transparent how legacy data is migrated: Every read access yields an entity that matches the structure that the current application code expects. Our live demo shows how ControVol Flex gracefully solves a broad range of common schema-evolution tasks.

Auflistung P265 - BTW2017 - Datenbanksysteme für Business, Technologie und Web nach Titel

Treffer pro Seite

Sortieroptionen