P241 - BTW2015 - Datenbanksysteme für Business, Technologie und Web

https://dl.gi.de/handle/20.500.12116/21088

Auflistung nach:

1 - 10 von 53

Konferenzbeitrag
Sparqling pig - processing linked data with pig Latin
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Hagedorn, Stefan; Hose, Katja; Sattler, Kai-Uwe
In recent years, dataflow languages such as Pig Latin have emerged as flexible and powerful tools for handling complex analysis tasks on big data. These languages support schema flexibility as well as common programming patterns such as iteration. They offer extensibility through user-defined functions while running on top of scalable distributed platforms. In doing so, these languages enable analytical tasks while avoiding the limitations of classical query languages such as SQL and SPARQL. However, the tuple-oriented view of general-purpose languages like Pig does not match very well the specifics of modern datasets available on the Web, which often use the RDF data model. Graph patterns, for instance, are one of the core concepts of SPARQL but have to be formulated as explicit joins, which burdens the user with the details of efficient query processing strategies. In this paper, we address this problem by proposing extensions to Pig that deal with linked data in RDF to bridge the gap between Pig and SPARQL for analytics. These extensions are realized by a set of user-defined functions and rewriting rules, still allowing to compile the enhanced Pig scripts to plain MapReduce programs. For all proposed extensions, we discuss possible rewriting strategies and present results from an experimental evaluation.
Konferenzbeitrag
KitMig - Flexible Live-Migration in mandantenfähigen Datenbanksystemen
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Göbel, Andreas; Sufryd, Marcel
Mandantenfähige Datenbanksysteme ermöglichen die gemeinsame Nutzung physischer Ressourcen durch eine Vielzahl von Mandanten. Ihr Einsatz erlaubt Anbietern von Cloud-Datenbankdiensten die Reduzierung der Betriebskosten durch eine hohe Ressourcennutzung und die Ausnutzung von Skaleneffekten. Die Migration von Mandanten innerhalb einer Serverfarm erweist sich in mandantenfähigen Datenbanksystemen als eine Schlüsselkomponente für Elastizität, Lastverteilung und Wartbarkeit. In diesen Einsatzbereichen werden jedoch unterschiedliche und zum Teil unvereinbare Anforderungen an eine Migration gestellt. Existierende Ansätze zur Live- Migration eignen sich aufgrund ihres statischen Ablaufs nur in wenigen Fällen. In diesem Beitrag stellen wir das Framework KitMig zur Live-Migration in mandantenfähigen Datenbanksystemen vor. In Anlehnung an einen Baukasten stellt es verschiedene Module zur Bestimmung des Migrationsablaufs bereit. Die geeignete Kombination von Modulen erlaubt die Anpassung des Ablaufs an die gestellten An- forderungen. Im Rahmen des Beitrags werden die KitMig-Phasen, zugehörige Module und die Implementierung im Open-Source-DBMS H21 beschrieben. Mehrere Untersuchungen demonstrieren die Charakteristik verschiedener Modulkombinationen und die Anwendbarkeit der resultierenden Abläufe in ausgewählten Einsatzbereichen.
Konferenzbeitrag
Generic Business Simulation Using an In-Memory Column Store
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Butzmann, Lars; Klauck, Stefan; Müller, Stephan; Uflacker, Matthias; Sinzig, Werner; Plattner, Hasso
Value driver trees are a well-known methodology to model dependencies such as the definition of key performance indicators. While the models have well-known semantics, they lack the right tool support for business simulations, because a flexible implementation that supports multidimensional, hierarchical value driver trees and data bindings is very complex and computationally challenging. This paper tackles this problem by proposing an approach for generic enterprise simulations which are based on value driver trees. Our approach is two-fold: we present the definition of a simulation meta model at design time, and the run-time simulation tool. The simulation meta model describes the structure of the dependency graph, the data binding, and the parametrization of the model to simulate data changes. The simulation tool can then be used to create and edit simulation model instances and run simulations in real-time by leveraging an in-memory column store. Besides the formal description of the approach, this work presents a prototypical implementation of the simulation tool and an evaluation using data of a consumer packaged goods company.
Konferenzbeitrag
Schema extraction and structural outlier detection for JSON-based nosql data stores
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Klettke, Meike; Störl, Uta; Scherzinger, Stefanie
Although most NoSQL Data Stores are schema-less, information on the structural properties of the persisted data is nevertheless essential during application development. Otherwise, accessing the data becomes simply impractical. In this paper, we introduce an algorithm for schema extraction that is operating outside of the NoSQL data store. Our method is specifically targeted at semi-structured data persisted in NoSQL stores, e.g., in JSON format. Rather than designing the schema up front, extracting a schema in hindsight can be seen as a reverse-engineering step. Based on the extracted schema information, we propose set of similarity measures that capture the degree of heterogeneity of JSON data and which reveal structural outliers in the data. We evaluate our implementation on two real-life datasets: a database from the Wendelstein 7-X project and Web Performance Data.
Konferenzbeitrag
Privacy preserving record linkage with ppjoin
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Sehili, Ziad; Kolb, Lars; Borgs, Christian; Schnell, Rainer; Rahm, Erhard
Privacy-preserving record linkage (PPRL) becomes increasingly important to match and integrate records with sensitive data. PPRL not only has to preserve the anonymity of the persons or entities involved but should also be highly efficient and scalable to large datasets. We therefore investigate how to adapt PPJoin, one of the fastest approaches for regular record linkage, to PPRL resulting in a new approach called P4Join. The use of bit vectors for PPRL also allows us to devise a parallel execution of P4Join on GPUs. We evaluate the new approaches and compare their efficiency with a PPRL approach based on multibit trees.
Konferenzbeitrag
Erzeugung kalibrierter, metrischer Distanzen mittels multidimensionaler Skalierung
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Böttcher, Thomas; Schmitt, Ingo
Für den Vergleich von Objekten, seien es Texte, Bilder etc, werden in der Regel Ähnlichkeiten bzw. Distanzen bzgl. verschiedener Eigenschaften (z.B. Kanten-, Farb-, Texturfeatures, GPS) genutzt. Werden mehrere Eigenschaften verwendet, führt dies zu einer verbesserten Ausdruckskraft. Problematisch sind hierbei die Eigenschaften der verwendeten Distanzmaße, insbesondere die Dreiecksungleichung. Die Verwendung effizienter Algorithmen, z.B. metrischer Indexsysteme erfordern jedoch diese Eigenschaften. Zusätzlich tritt z.B. bei unterschiedlichen Distanzverteilungen eine Dominanz eines Distanzmaßes auf, die das aggregierte Gesamtergebnis ungewollt verfälscht. In dieser Arbeit präsentieren wir einen Lösungsansatz, der beide Probleme, mit Hilfe eines Verfahrens der multivariaten Statistik, der multidimensionalen Skalierung (MDS), löst. Wir zeigen wie die Dominanz einer Eigenschaft nachgewiesen und quantifiziert werden kann. Es wird zudem ein erweiterter MDS-Ansatz vorgestellt, der die Vergleichbarkeit verschiedener Distanzmaße gewährleistet. Unser Ansatz erlaubt dabei die Verwendung nicht-metrischer Distanzmaße. Eine Evaluierung auf unterschiedlichen Distanzverteilungen zeigt dabei eine fast vollständige Reduzierung der Dominanz.
Konferenzbeitrag
Unnesting Arbitrary Queries
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Neumann, Thomas; Kemper, Alfons
SQL-99 allows for nested subqueries at nearly all places within a query. From a user's point of view, nested queries can greatly simplify the formulation of complex queries. However, nested queries that are correlated with the outer queries frequently lead to dependent joins with nested loops evaluations and thus poor performance. Existing systems therefore use a number of heuristics to unnest these queries, i.e., de-correlate them. These unnesting techniques can greatly speed up query processing, but are usually limited to certain classes of queries. To the best of our knowledge no existing system can de-correlate queries in the general case. We present a generic approach for unnesting arbitrary queries. As a result, the de-correlated queries allow for much simpler and much more efficient query evaluation.
Konferenzbeitrag
Relationships for dynamic data types in RSQL
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Jäkel, Tobias; Kühn, Thomas; Hinkel, Stefan; Voigt, Hannes; Lehner, Wolfgang
Currently, there is a mismatch between the conceptual model of an information system and its implementation in a database management system (DBMS). Most of the conceptual modeling languages relate their conceptual entities with relationships, but relational database management systems solely rely on the notion of relations to model both, entities and relationships. To make things worse, real world objects are not static as assumed in such modeling languages, but change over time. Thus, modeling languages were enriched to model those scenarios, as well. However, mapping these models onto relational databases requires the use of object-relational mapping engines, which in turn hide the semantics of the conceptual model from the DBMS. Consequently, traditional relational database systems cannot directly ensure specific consistency constraints and thus lose their meaning as single point of truth for highly distributed information systems. To overcome these issues we have proposed RSQL, a data model and query language introducing role-based data structures in DBMSs. Despite the fact that RSQL is able to handle complex objects, it does not support relationships between those objects. Therefore, this work adds relationships to RSQL by augmenting the data model and extending its query language. As a result, this extension allows for the direct representation of conceptual models with complex objects and relationships in the DBMS. Thus, relationships can be directly addressed in queries and the DBMS automatically ensures relationship consistency constraints as well as cardinality. In sum, a DBMS equipped with the extended RSQL is apt for storing and querying conceptual models and thus regains its rightful position as the single point of truth for highly distributed information systems.
Konferenzbeitrag
Meeting the challenges of integrating large and diverse geographic databases
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Schäfers, Michael; Lipeck, Udo W.
Using data matching techniques to identify multiple representations of the same real-world entity is an essential step for all data integration tasks. While matching standard data types like strings or numbers with generic methods is well-studied, approaches for non-standard data have to deal with domain-specific challenges. For geographic databases containing spatial features we face a high degree of diversity in terms of geometric and semantic modeling between data sources. Likewise, complex geometric data types and topological relations require efficient processing. Finally, geodatabases can grow very large if they cover extensive regions or whole countries. In this paper, we present our SimMatching approach for integrating relational geodatabases that meets these challenges. In particular, we study road networks from several data sources. Our iterative algorithm matches semantically equivalent objects based on geometric and semantic attribute similarity measures. Relational similarity helps to solve difficult situations by exploiting the underlying graph structure of road networks: Already confirmed neighbouring matchings improve the similarity value of a given matching. Adaptability to diverse input data is reached by combining and weighting subsets of similarity measures. A greedy approach and an efficient end-toend-system built upon simple and flexible components outperform previous systems in terms of runtimes while showing matching results of high quality. Scalability to large geodatabases is supported by a partitioning framework together with parallel processing. We have experimentally verified our approach with large real-world datasets.
Konferenzbeitrag
Implementierung von IR-Modellen auf Basis spaltenorientierter Datenbanken oder invertierter Listen
(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Stadelmann, Thomas; Blank, Daniel; Henrich, Andreas
Im Information Retrieval (IR) wird die Anwendung spaltenorientierter Da- tenbankmanagementsysteme (DBMS) diskutiert, um u.a. durch die Trennung von Da- tenhaltung und Suchlogik zusätzliche Flexibilität zu gewinnen. Es stellt sich die Frage, ob sich solche Systeme für den praktischen Einsatz eignen, oder ob deren Einsatz auf das Prototyping beschränkt ist. Ziel dieser Arbeit ist es daher, IR-Systeme auf Basis spaltenorientierter DBMS mit konventionellen IR-Bibliotheken auf Basis invertierter Listen bzgl. ihrer Effektivität und Effizienz unter Verwendung des weit verbreiteten Okapi BM25 Retrieval-Modells zu vergleichen. Dabei werden bisherige Arbeiten insbesondere im Hinblick auf die Anzahl und den Typ der untersuchten Anfragen sowie die durchgängige Verwendung von Kompressionsmöglichkeiten erweitert.

Auflistung P241 - BTW2015 - Datenbanksysteme für Business, Technologie und Web nach Erscheinungsdatum

Treffer pro Seite

Sortieroptionen