P265 - BTW2017 - Datenbanksysteme für Business, Technologie und Web
Auflistung P265 - BTW2017 - Datenbanksysteme für Business, Technologie und Web nach Erscheinungsdatum
1 - 10 von 56
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragMetadata Management for Data Integration in Medical Sciences(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Kirsten, Toralf; Kiel, Alexander; Rühle, Mathias; Wagner, JonasClinical and epidemiological studies are commonly used in medical sciences. They typically collect data by using different input forms and information systems. Metadata describing input forms, database schemas and input systems are used for data integration but are typically distributed over different software tools; each uses portions of metadata, such as for loading (ETL), data presentation and analysis. In this paper, we describe an approach managing metadata centrally and consistently in a dedicated Metadata Repository (MDR). Metadata can be provided to different tools. Moreover, the MDR includes a matching component creating schema mappings as a prerequisite to integrate captured medical data. We describe the approach, the MDR infrastructure and provide algorithms for creating schema mappings. Finally, we show selected evaluation results. The MDR is fully operational and used to integrate data from a multitude of input forms and systems in the epidemiological study LIFE.
- KonferenzbeitragThe Big Picture: Understanding large-scale graphs using Graph Grouping with GRADOOP(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Junghanns, Martin; Petermann, André; Teichmann, Niklas; Rahm, ErhardGraph grouping supports data analysts in decision making based on the characteristics of large-scale, heterogeneous networks containing millions or even billions of vertices and edges. We demonstrate graph grouping with G , a scalable system supporting declarative programs composed from multiple graph operations. Using social network data, we highlight the analytical capabilities enabled by graph grouping in combination with other graph operators. The resulting graphs are visualized and visitors are invited to either modify existing or write new analytical programs. G is implemented on top of Apache Flink, a state-of-the-art distributed dataflow framework, and thus allows us to scale graph analytical programs across multiple machines. In the demonstration, programs can either be executed locally or remotely on our research cluster.
- KonferenzbeitragDetection and Implicit Classification of Outliers via Different Feature Sets in Polygonal Chains(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Singhof, Michael; Klassen, Gerhard; Braun, Daniel; Conrad, StefanMany outlier detection tasks involve a classification of outliers of di erent types. Most standard procedures solve this problem in two steps: First, an outlier detection algorithm is carried out, which is normally trained on outlier free data, only, since the samples of outliers are limited. Second, the outliers detected in that step, are classified with a conventional classification algorithm, that needs samples for all classes. However, often the quality of the classification is lowered due to the small number of available samples. Therefore, in this work, we introduce an outlier detection and classification algorithm, that does not depend on training data for the classification process. Instead, we assume, that di erent kinds of outliers are inferred by di erent processes and as such should be detected by different outlier detection approaches. This work focuses on the example of outliers in mountain silhouettes.
- KonferenzbeitragThe Digital Business Platform(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Jost, WolframDie Digitale Transformation stellt alles auf den Kopf Digital Business Platform – Das Fundament einer erfolgreichen digitalen Transformation Wer denkt, dass die Digitalisierung ein neuer Trend ist, der bald wieder vorbei ist, liegt falsch. Die flächendeckende Digitalisierung ist nicht mehr zu stoppen . . . sie hat bereits ganze Industrien in Ihren Manifesten erschüttert und wird sich eher noch schneller und stärker ausbreiten.
- KonferenzbeitragThe Complete Story of Joins (inHyPer)(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Neumann, Thomas; Leis, Viktor; Kemper, AlfonsSQL has evolved into an (almost) fully orthogonal query language that allows (arbitrarily deeply) nested subqueries in nearly all parts of the query. In order to avoid recursive evaluation strategies which incur unbearable O(n2) runtime we need an extended relational algebra to translate such subqueries into non-standard join operators. This paper concentrates on the non-standard join operators beyond the classical textbook inner joins, outer joins and (anti) semi joins. Their implementations in HyPer were covered in previous publications which we refer to. In this paper we cover the new join operators mark-join and single-join at both levels: At the logical level we show the translation and reordering possibilities in order to e ectively optimize the resulting query plans. At the physical level we describe hash-based and block-nested loop implementations of these new joins. Based on our database system HyPer, we describe a blue print for the complete query translation and optimization pipeline. The practical need for the advanced join operators is proven by an analysis of the two well known TPC-H and TPC-DS benchmarks which revealed that all variants are actually used in these query sets.
- KonferenzbeitragConfidentiality à la Carte with Cipherbase(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Kossmann, DonaldOrganizations move data and workloads to the cloud because the cloud is cheaper, more agile, and more secure. Unfortunately, the cloud is not perfect and there are some fundamental tradeoffs that need to be made in the cloud. The Cipherbase project studies the tradeoffs between confidentiality and functionality that arise when state-of-the-art cryptography is combined with databases in the cloud: The more operations that are supported on encrypted data, the more information that can be leaked unintentionally. There has been a great deal of work studying these tradeoffs in the specific context of property-preserving encryption techniques. For instance, deterministic encryption can support equality predicates directly over encrypted data, but it is also vulnerable to inference attacks. This talk discusses the tradeoffs that arise in a more general context when trusted computing platforms such as FPGAs or Intel SGX technology are used to process encrypted data.
- KonferenzbeitragDistributed Grouping of Property Graphs with GRADOOP(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Junghanns, Martin; Petermann, André; Rahm, ErhardProperty graphs are an intuitive way to model, analyze and visualize complex relationships among heterogeneous data objects, for example, as they occur in social, biological and information networks. These graphs typically contain thousands or millions of vertices and edges and their entire representation can easily overwhelm an analyst. One way to reduce complexity is the grouping of vertices and edges to summary graphs. In this paper, we present an algorithm for graph grouping with support for attribute aggregation and structural summarization by user-defined vertex and edge properties. The algorithm is part of G , an open-source system for graph analytics. G is implemented on top of Apache Flink, a state-of-the-art distributed dataflow framework, and thus allows us to scale graph analytical programs across multiple machines. Our evaluation demonstrates the scalability of the algorithm on real-world and synthetic social network data.
- KonferenzbeitragDeLorean: A Storage Layer to Analyze Physical Data at Scale(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Kußmann, Michael; Berens, Maximilian; Eitschberger, Ulrich; Kilic, Ayse; Lindemann, Thomas; Meier, Frank; Niet, Ramon; Schellenberg, Margarete; Stevens, Holger; Wishahi, Julian; Spaan, Bernhard; Teubner, JensModern research in high energy physics depends on the ability to analyse massive volumes of data in short time. In this article, we report on DeLorean, which is a new system architecture for high-volume data processing in the domain of particle physics. DeLorean combines the simplicity and performance of relational database technology with the massive scalability of modern cloud execution platforms (Apache Drill for that matter). Experiments show a four-fold performance improvement over state-of-the-art solutions.
- KonferenzbeitragInVerDa – The Liquid Database(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Herrmann, Kai; Voigt, Hannes; Seyschab, Thorsten; Lehner, WolfgangMultiple applications, which share one common database, will evolve over time by their very nature. Often, former versions need to stay available, so database developers find themselves maintaining co-existing schema versions of multiple applications in multiple versions—usually with handwritten delta code—which is highly error-prone and explains significant costs in software projects. We showcase I V D , a tool using the richer semantics of a bidirectional database evolution language to generate all the delta code automatically, easily providing co-existing schema versions within one database. I V D automatically decides on an optimized physical database schema serving all schema versions to transparently optimize the performance for the current workload.
- KonferenzbeitragAnfrage-getriebener Wissenstransfer zur Unterstützung von Datenanalysten(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Wahl, Andreas M.; Endler, Gregor; Schwab, Peter K.; Herbst, Sebastian; Lenz, RichardIn größeren Organisationen arbeiten verschiedene Gruppen von Datenanalysten mit unterschiedlichen Datenquellen, um analytische Fragestellungen zu beantworten. Das Formulieren effektiver analytischer Anfragen setzt voraus, dass die Datenanalysten profundes Wissen über die Existenz, Semantik und Verwendungskontexte relevanter Datenquellen besitzen. Derartiges Wissen wird informell innerhalb einzelner Gruppen von Datenanalysten geteilt, jedoch meist nicht in formalisierter Form für andere verfügbar gemacht. Mögliche Synergien bleiben somit ungenutzt. Wir stellen einen neuartigen Ansatz vor, der existierende Datenmanagementsysteme mit zusätzlichen Fähigkeiten für diesen Wissenstransfer erweitert. Unser Ansatz fördert die Kollaboration zwischen Datenanalysten, ohne dabei etablierte Analyseprozesse zu stören. Im Gegensatz zu bisherigen Forschungsansätzen werden die Analysten beim Transfer des in analytischen Anfragen enthaltenen Wissens unterstützt. Relevantes Wissen wird aus dem Anfrageprotokoll extrahiert, um das Auffinden von Datenquellen und die inkrementelle Datenintegration zu erleichtern. Extrahiertes Wissen wird formalisiert und zum Anfragezeitpunkt bereitgestellt.