Logo des Repositoriums
 

Incremental ETL Pipeline Scheduling for Near Real-Time Data Warehouses

dc.contributor.authorQu, Weiping
dc.contributor.authorDeßloch, Stefan
dc.contributor.editorMitschang, Bernhard
dc.contributor.editorNicklas, Daniela
dc.contributor.editorLeymann, Frank
dc.contributor.editorSchöning, Harald
dc.contributor.editorHerschel, Melanie
dc.contributor.editorTeubner, Jens
dc.contributor.editorHärder, Theo
dc.contributor.editorKopp, Oliver
dc.contributor.editorWieland, Matthias
dc.date.accessioned2017-06-20T20:24:29Z
dc.date.available2017-06-20T20:24:29Z
dc.date.issued2017
dc.description.abstractWe present our work based on an incremental ETL pipeline for on-demand data warehouse maintenance. Pipeline parallelism is exploited to concurrently execute a chain of maintenance jobs, each of which takes a batch of delta tuples extracted from source-local transactions with commit timestamps preceding the arrival time of an incoming warehouse query and calculates final deltas to bring relevant warehouse tables up-to-date. Each pipeline operator runs in a single, non-terminating thread to process one job at a time and re-initializes itself for a new one. However, to continuously perform incremental joins or maintain slowly changing dimension tables (SCD), the same staging tables or dimension tables can be concurrently accessed and updated by distinct pipeline operators which work on different jobs. Inconsistencies can arise without proper thread coordinations. In this paper, we proposed two types of consistency zones for SCD and incremental join to address this problem. Besides, we reviewed existing pipeline scheduling algorithms in our incremental ETL pipeline with consistency zones.en
dc.identifier.isbn978-3-88579-659-6
dc.identifier.pissn1617-5468
dc.language.isoen
dc.publisherGesellschaft für Informatik, Bonn
dc.relation.ispartofDatenbanksysteme für Business, Technologie und Web (BTW 2017)
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-265
dc.titleIncremental ETL Pipeline Scheduling for Near Real-Time Data Warehousesen
dc.typeText/Conference Paper
gi.citation.endPage308
gi.citation.startPage299
gi.conference.date6.-10. März 2017
gi.conference.locationStuttgart
gi.conference.sessiontitleStreaming and Dataflows

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
paper20.pdf
Größe:
524.6 KB
Format:
Adobe Portable Document Format