Autor*innen mit den meisten Dokumenten
Neueste Veröffentlichungen
- ZeitschriftenartikelPolyglot Persistence(Datenbank-Spektrum: Vol. 15, No. 3, 2015) Gessert, Felix; Ritter, Norbert
- ZeitschriftenartikelGenome sequence analysis with MonetDB(Datenbank-Spektrum: Vol. 15, No. 3, 2015) Cijvat, Robin; Manegold, Stefan; Kersten, Martin; Klau, Gunnar W.; Schönhuth, Alexander; Marschall, Tobias; Zhang, YingNext-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but yields terabytes of data to be stored and analyzed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables easy, flexible, and rapid management and analysis of sequence alignment data stored as Sequence Alignment/Map (SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virusgenomes.
- ZeitschriftenartikelPlacement-Safe Operator-Graph Changes in Distributed Heterogeneous Data Stream Systems(Datenbank-Spektrum: Vol. 15, No. 3, 2015) Pollner, Niko; Steudtner, Christian; Meyer-Wegener, KlausData stream processing systems enable querying continuous data without first storing it. Data stream queries may combine data from distributed data sources like different sensors in an environmental sensing application. This suggests distributed query processing. Thus the amount of transferred data can be reduced and more processing resources are available.However, distributed query processing on probably heterogeneous platforms complicates query optimization. This article investigates query optimization through operator graph changes and its interaction with operator placement on heterogeneous distributed systems. Pre-placement operator graph changes may prevent certain operator placements. Thereby the resource consumption of the query execution may unexpectedly increase. Based on the operator placement problem modeled as a task assignment problem (TAP), we prove that it is NP-hard to decide in general whether an arbitrary operator graph change may negatively influence the best possible TAP solution. We present conditions for several specific operator graph changes that guarantee to preserve the best possible TAP solution.
- ZeitschriftenartikelDie Arbeitsgruppe Datenbanksysteme an der Philipps-Universität Marburg(Datenbank-Spektrum: Vol. 15, No. 3, 2015) Seeger, BernhardIn diesem Beitrag wird die Arbeitsgruppe Datenbanksysteme an der Universität Marburg vorgestellt. Es wird ein kurzer historischer Rückblick gegeben und auf aktuelle Forschungsgebiete eingegangen. Zudem wird das Engagement der Arbeitsgruppe in der Lehre dargestellt und die Kooperation mit anderen Wissenschaftlern kurz skizziert.
- ZeitschriftenartikelQuery Optimization in Heterogenous Event Processing Federations(Datenbank-Spektrum: Vol. 15, No. 3, 2015) Pinnecke, Marcus; Hoßbach, BastianContinuous processing of event streams evolved to an important class of data management over the last years and will become even more important due to novel applications such as the Internet of Things. Because systems for data stream and event processing have been developed independent of each other, often in competition and without the existence of any standards, the Stream Processing System (SPS) landscape is extremely heterogeneous today. To overcome the problems caused by this heterogeneity, a novel event processing middleware, the Java Event Processing Connectivity (JEPC), has been presented recently. However, despite the fact that SPSs can be accessed uniformly using JEPC, their different performance profiles caused by different algorithms and implementations remain. This gives the opportunity to query optimization, because individual system strengths can be exploited. In this paper, we present a novel query optimizer that exploits the technical heterogeneity in a federation of different unified SPSs. Taking into account different performance profiles of SPSs, we address query plan partitioning, candidate selection, and reducing inter-system communication in order to improve the overall query performance. We suggest a heuristic that finds a good initial mapping of sub-plans to a set of heterogenous SPSs. An experimental evaluation clearly shows that heterogeneous federations outperform homogeneous federations, in general, and that our heuristic performs well in practice.
- ZeitschriftenartikelDissertationen(Datenbank-Spektrum: Vol. 15, No. 3, 2015)
- ZeitschriftenartikelVAT: A System for Visualizing, Analyzing and Transforming Spatial Data in Science(Datenbank-Spektrum: Vol. 15, No. 3, 2015) Authmann, Christian; Beilschmidt, Christian; Drönner, Johannes; Mattig, Michael; Seeger, BernhardThe amount of available data changes the style of research in geo-scientific domains, and thus influences the requirements for spatial processing systems. To support data-driven research and exploratory workflows, we propose the Visualization, Analysis & Transformation system (VAT). We first identify ten fundamental requirements, which span from supporting spatial data types over low latency computations to visualization techniques. Based on these we evaluate state-of-the-art systems from the domains of spatial frameworks, GIS, workflow systems, scientific databases and Big Data solutions. The goal of the VAT system is to overcome the identified limitations by a holistic approach to raster and vector data, demand-driven and tiled processing, and the efficient usage of heterogeneous hardware architectures. A first comparison with other systems shows the validity of our approach.
- ZeitschriftenartikelModulares Verteilungskonzept für Datenstrommanagementsysteme(Datenbank-Spektrum: Vol. 15, No. 3, 2015) Michelsen, Timo; Brand, Michael; Appelrath, H.-JürgenFür die Verteilung kontinuierlicher Anfragen in verteilten Datenstrommanagementsystemen gibt es je nach Netzwerkarchitektur und Anwendungsfall unterschiedliche Strategien. Es ist u.U. nachteilig, sich auf eine Strategie festzulegen, besonders wenn sich Netzwerkarchitektur oder Anwendungsfall ändern. In dieser Arbeit wird ein Ansatz für eine flexible und erweiterbare Anfrageverteilung in verteilten Datenstrommanagementsystemen vorgestellt. Der Ansatz umfasst drei Schritte: 1) Partitionierung, 2) Modifikation und 3) Allokation. Bei der Partitionierung wird eine kontinuierliche Anfrage in disjunkte Teilanfragen zerlegt. Die optionale Modifikation erlaubt es, Mechanismen wie Fragmentierung oder Replikation zu verwenden. Bei der Allokation werden die einzelnen Teilanfragen schließlich Knoten im Netzwerk zugewiesen, um dort ausgeführt zu werden.Für jeden Schritt können unabhängige Strategien verwendet werden. Dieser modulare Aufbau ermöglicht eine individuelle Anfrageverteilung. Zudem können bereits vorhandene Strategien aus anderen Arbeiten und Systemen integriert werden (bspw. weitere Strategien zur Allokation). In dieser Arbeit werden für jeden der drei Teilschritte exemplarisch Strategien vorgestellt.Drei Anwendungsbeispiele zeigen die Vorteile des vorgestellten, modularen Ansatzes gegenüber einer festen Verteilungsstrategie.
- ZeitschriftenartikelEditorial(Datenbank-Spektrum: Vol. 15, No. 3, 2015) Härder, Theo
- ZeitschriftenartikelInstant recovery with write-ahead logging(Datenbank-Spektrum: Vol. 15, No. 3, 2015) Härder, Theo; Sauer, Caetano; Graefe, Goetz; Guy, WeyInstant recovery improves system availability by reducing the mean time to repair, i.e., the interval during which a database is not available for queries and updates due to recovery activities. Variants of instant recovery pertain to system failures, media failures, node failures, and combinations of multiple failures. After a system failure, instant restart permits new transactions immediately after log analysis, before and concurrent to “redo” and “undo” recovery actions. After a media failure, instant restore permits new transactions immediately after allocation of a replacement device, before and concurrent to restoring backups and replaying the recovery log.Write-ahead logging is already ubiquitous in data management software. The recent definition of single-page failures and techniques for log-based single-page recovery enable immediate, lossless repair after a localized wear-out in novel or traditional storage hardware. In addition, they form the backbone of on-demand “redo” in instant restart, instant restore, and eventually instant failover. Thus, they complement on-demand invocation of traditional single-transaction “undo” or rollback.In addition to these instant recovery techniques, the discussion introduces self-repairing indexes and much faster offline restore operations, which impose no slowdown in backup operations and hardly any slowdown in log archiving operations. The new restore techniques also render differential and incremental backups obsolete, complete backup commands on a database server practically instantly, and even permit taking full up-to-date backups without imposing any load on the database server.