Auflistung nach Autor:in "Qu, Weiping"
1 - 2 von 2
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelA Real-time Materialized View Approach for Analytic Flows in Hybrid Cloud Environments(Datenbank-Spektrum: Vol. 14, No. 2, 2014) Qu, Weiping; Dessloch, StefanNext-generation business intelligence (BI) enables enterprises to quickly react in changing business environments. Increasingly, data integration pipelines need to be merged with query pipelines for real-time analytics from operational data. Newly emerging hybrid analytic flows have been becoming attractive which consist of a set of extract-transform-load (ETL) jobs together with analytic jobs running over multiple platforms with different functionality.In traditional databases, materialized views are used to optimize query performance. In cross-platform, large-scale data transformation environments, similar challenges (e.g. view selection) arise when using materialized views. In this work, we propose an approach that generates materialized views in hybrid flows and maintains these views in a query-driven, incremental manner. To accelerate data integration processes, the location of a materialization point in a transformation flow varies dynamically based on metrics like source update rates and maintenance cost in terms of flow operations. Besides, by picking up the most suitable platform for accommodating views, for example, materializing and maintaining intermediate results of Hadoop jobs in relational databases, better performance has been shown.
- KonferenzbeitragIncremental ETL Pipeline Scheduling for Near Real-Time Data Warehouses(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Qu, Weiping; Deßloch, StefanWe present our work based on an incremental ETL pipeline for on-demand data warehouse maintenance. Pipeline parallelism is exploited to concurrently execute a chain of maintenance jobs, each of which takes a batch of delta tuples extracted from source-local transactions with commit timestamps preceding the arrival time of an incoming warehouse query and calculates final deltas to bring relevant warehouse tables up-to-date. Each pipeline operator runs in a single, non-terminating thread to process one job at a time and re-initializes itself for a new one. However, to continuously perform incremental joins or maintain slowly changing dimension tables (SCD), the same staging tables or dimension tables can be concurrently accessed and updated by distinct pipeline operators which work on different jobs. Inconsistencies can arise without proper thread coordinations. In this paper, we proposed two types of consistency zones for SCD and incremental join to address this problem. Besides, we reviewed existing pipeline scheduling algorithms in our incremental ETL pipeline with consistency zones.