Datenbank Spektrum 17(2) - Juli 2017

https://dl.gi.de/handle/20.500.12116/10988

Autor*innen mit den meisten Dokumenten

Heuer, Andreas

Paradies, Marcus

Voigt, Hannes

Breß, Sebastian

Bruder, Ilvio

Auflistung nach:

1 - 10 von 10

Zeitschriftenartikel
Big Graph Data Analytics on Single Machines – An Overview
(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Paradies, Marcus; Voigt, Hannes
Driven by a multitude of use cases, graph data analytics has become a hot topic in research and industry. Particularly on big graphs, performing complex analytical queries efficiently to derive new insights is a challenging task. Systems that aim at solving the technical part of this challenge are often referred to as graph processing systems. They allow expressing and executing analytic algorithms and queries, while hiding most of the technical details related to efficiently storing and processing graph data. Since 2010, work on graph processing systems for distributed systems as well as shared memory systems has virtually exploded. In this article, we give an overview of this work with the particular focus on graph processing systems for large multiprocessor machines. We describe the state of the art established in recent years and outline trends and challenges in research and development that point towards the future of graph processing systems.
Zeitschriftenartikel
Reducing the Distance Calculations when Searching an M‑Tree
(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Guhlemann, Steffen; Petersohn, Uwe; Meyer-Wegener, Klaus
Recent years have brought rising interest in efficiently searching for similar entities in a broad range of domains. Such search can be used to facilitate working with unstructured data such as genome sequences, text corpora, complex production information, or multimedia content, where queries always contain an amount of noise. In such domains the only common structure is a distance function obeying the axioms of a metric. As mostly no other structure information is available, a lot of distances have to be computed during the course of a search. Contrary to classical database indexes, where the optimization focus is on reducing the number of disk accesses (or in case of in-memory databases the number of tree traversal operations), a major cost driver in such multimedia domains is this number of distance calculations which can be very computation intense.There exists a range of index structures for supporting similarity search in metric spaces. A very promising one is the M‑Tree, along with a number of compatible extensions (e. g. Slim-Tree, Bulk Loaded M‑Tree, multi way insertion M‑Tree, $$M^{2}$$M2-Tree, etc.). The M‑Tree family uses common algorithms for the $$k$$k-nearest-neighbor and range search. These algorithms leave room for optimization in terms of necessary distance calculations. In this paper we present new algorithms for these tasks to considerably improve retrieval performance of all M‑Tree-compatible data structures.
Zeitschriftenartikel
Dynamic Event-Activity Networks in Public Transportation
(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Müller-Hannemann, Matthias; Rückert, Ralf
Real-time timetable information and delay management in public transportation systems are two challenging applications which can be modeled as optimization problems on dynamically changing, large and complex graphs, so-called event-activity networks.We describe both applications in detail, review the state-of-the-art and explain the requirements for systems solving these problems in a productive environment. Focussing on recent research on decision support for train dispatchers, we sketch the system architecture for the software prototype PANDA.
Zeitschriftenartikel
Efficient Batched Distance, Closeness and Betweenness Centrality Computation in Unweighted and Weighted Graphs
(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Then, Manuel; Günnemann, Stephan; Kemper, Alfons; Neumann, Thomas
Distance and centrality computations are important building blocks for modern graph databases as well as for dedicated graph analytics systems. Two commonly used centrality metrics are the compute-intense closeness and betweenness centralities, which require numerous expensive shortest distance calculations. We propose batched algorithm execution to run multiple distance and centrality computations at the same time and let them share common graph and data accesses. Batched execution amortizes the high cost of random memory accesses and presents new vectorization potential on modern CPUs and compute accelerators. We show how batched algorithm execution can be leveraged to significantly improve the performance of distance, closeness, and betweenness centrality calculations on unweighted and weighted graphs. Our evaluation demonstrates that batched execution can improve the runtime of these common metrics by over an order of magnitude.
Zeitschriftenartikel
Editorial
(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Voigt, Hannes; Paradies, Marcus; Härder, Theo
Zeitschriftenartikel
The Hydra.PowerGraph System
(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Meyer, Holger; Schering, Alf-Christian; Heuer, Andreas
Directed hypergraphs are known from graph theory [11] and are well understood within their own domain [7–9, 22, 23]. This paper provides an overview on the expressiveness of directed and typed hypergraphs as a modeling paradigm not only for the content of digital libraries and archives but a variety of applications. Furthermore, hypergraphs are sufficiently expressive to provide an implementation logic for conceptual models like CIDOC/CRM [18] in the context of museum-related systems and digital archives.The directed hypergraph model supports typed nodes and individual flexible sets of attributes on a per node type basis. This allows for efficient mapping on object-relational database structures. It also features a flexible, semi-structured type system for hyperedges. The graph model is accompanied by a set of well defined graph operations forming an algebra and a descriptive hypergraph query language GrafL. This language supports typed, structure and value based queries as well as fundamental graph algorithms.The suitability of such a hypergraph-based model is illustrated with a large digital ethnological archive system, which is developed in the WossiDiA project [43, 52, 53].
Zeitschriftenartikel
Efficiently Storing and Analyzing Genome Data in Database Systems
(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Dorok, Sebastian; Breß, Sebastian; Teubner, Jens; Läpple, Horstfried; Saake, Gunter; Markl, Volker
Genome-analysis enables researchers to detect mutations within genomes and deduce their consequences. Researchers need reliable analysis platforms to ensure reproducible and comprehensive analysis results. Database systems provide vital support to implement the required sustainable procedures. Nevertheless, they are not used throughout the complete genome-analysis process, because (1) database systems suffer from high storage overhead for genome data and (2) they introduce overhead during domain-specific analysis. To overcome these limitations, we integrate genome-specific compression into database systems using a specialized database schema. Thus, we can reduce the storage consumption of a database approach by up to 35%. Moreover, we exploit genome-data characteristics during query processing allowing us to analyze real-world data sets up to five times faster than specialized analysis tools and eight times faster than a straightforward database approach.
Zeitschriftenartikel
News
(Datenbank-Spektrum: Vol. 17, No. 2, 2017)
Zeitschriftenartikel
BTW 2017 in Stuttgart
(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Mitschang, Bernhard; Schwarz, Holger
Zeitschriftenartikel
Daten wie Sand am Meer – Datenerhebung, -strukturierung, -management und Data Provenance für die Ostseeforschung
(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Bruder, Ilvio; Klettke, Meike; Möller, Mark Lukas; Meyer, Frank; Heuer, Andreas; Jürgensmann, Susanne; Feistel, Susanne
Das Datenmanagement für heterogene Umweltdaten wird am Beispiel verschiedener Projekte aus dem maritimen Umfeld gezeigt. Besonderer Schwerpunkt dabei sind eine Pipeline zur Integration heterogener Forschungsdaten, die Nachvollziehbarkeit der Daten (Data Provenance) und die Berücksichtigung temporaler Aspekte bei der Erhebung, Speicherung und Auswertung der Daten.

Autor*innen mit den meisten Dokumenten

Heuer, Andreas

Paradies, Marcus

Voigt, Hannes

Breß, Sebastian

Bruder, Ilvio

Neueste Veröffentlichungen

Treffer pro Seite

Sortieroptionen