Logo des Repositoriums
 
Textdokument

Computation Offloading in JVM-based Dataflow Engines

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Zusatzinformation

Datum

2019

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik, Bonn

Zusammenfassung

State-of-the-art dataflow engines, such as Apache Spark and Apache Flink scale out on large clusters for a variety of data-processing tasks, including machine learning and data mining algorithms. However, being based on the JVM, they are unable to apply optimizations supported by modern CPUs. On the contrary, specialized data processing frameworks scale up by exploiting modern CPU characteristics. The goal of this thesis is to find the sweet spot between scale-out and scale-up systems by offloading computation from dataflow engines to specialized systems. We propose two computation offloading methods, reason about their applicability, and implement a prototype based on Apache Spark. Our evaluation shows that for compute-intensive tasks, computation offloading leads to performance improvements of up to a factor of 2.5x. For certain UDF scenarios, computation offloading performs worse by up to a factor of 3x: our microbenchmarks show that 80% of the time is spent on serialization operations. By employing data exchange without serialization, computation offloading achieves performance improvements by up to 10x.

Beschreibung

Gavriilidis, Haralampos (2019): Computation Offloading in JVM-based Dataflow Engines. BTW 2019 – Workshopband. DOI: 10.18420/btw2019-ws-20. Gesellschaft für Informatik, Bonn. PISSN: 1617-5468. ISBN: 978-3-88579-684-8. pp. 195-204. Studierendenprogramm. Rostock. 4.-8. März 2019

Zitierform

Tags