Preserving Recomputability of Results from Big Data Transformation Workflows

Kricke, Matthias; Grimmer, Martin; Schmeißer, Michael

Preserving Recomputability of Results from Big Data Transformation Workflows

dc.contributor.author	Kricke, Matthias
dc.contributor.author	Grimmer, Martin
dc.contributor.author	Schmeißer, Michael
dc.date.accessioned	2018-01-08T08:08:01Z
dc.date.available	2018-01-08T08:08:01Z
dc.date.issued	2017
dc.description.abstract	The ability to recompute results from raw data at any time is important for data-driven companies to ensure data stability and to selectively incorporate new data into an already delivered data product. However, data transformation processes are heterogeneous and it is possible that manual work of domain experts is part of the process to create a deliverable data product. Domain experts and their work are expensive and time consuming, a recomputation process needs the ability of automatically adding former human interactions. It becomes even more challenging when external systems are used or data changes over time. In this paper, we propose a system architecture which ensures recomputability of results from big data transformation workflows on internal and external systems by using distributed key-value data stores. Furthermore, the system architecture will contain the possibility of incorporating human interactions of former data transformation processes. We will describe how our approach significantly relieves external systems and at the same time increases the performance of the big data transformation workflows.
dc.identifier.pissn	1610-1995
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/11021
dc.publisher	Springer
dc.relation.ispartof	Datenbank-Spektrum: Vol. 17, No. 3
dc.relation.ispartofseries	Datenbank-Spektrum
dc.subject	BigData
dc.subject	Bitemporality
dc.subject	Recomputability
dc.subject	System architecture
dc.subject	Time-to-consistency
dc.title	Preserving Recomputability of Results from Big Data Transformation Workflows
dc.type	Text/Journal Article
gi.citation.endPage	253
gi.citation.startPage	245

Sammlungen

Datenbank Spektrum 17(3) - November 2017

Preserving Recomputability of Results from Big Data Transformation Workflows

Dateien

Sammlungen