Computation Offloading in JVM-based Dataflow Engines

Gavriilidis, Haralampos

Textdokument

Computation Offloading in JVM-based Dataflow Engines

Dateien

D1-4.pdf (282.21 KB)

Datum

2019

Autor:innen

Gavriilidis, Haralampos

Quelle

BTW 2019 – Workshopband

Studierendenprogramm

Verlag

Gesellschaft für Informatik, Bonn

Zusammenfassung

State-of-the-art dataflow engines, such as Apache Spark and Apache Flink scale out on large clusters for a variety of data-processing tasks, including machine learning and data mining algorithms. However, being based on the JVM, they are unable to apply optimizations supported by modern CPUs. On the contrary, specialized data processing frameworks scale up by exploiting modern CPU characteristics. The goal of this thesis is to find the sweet spot between scale-out and scale-up systems by offloading computation from dataflow engines to specialized systems. We propose two computation offloading methods, reason about their applicability, and implement a prototype based on Apache Spark. Our evaluation shows that for compute-intensive tasks, computation offloading leads to performance improvements of up to a factor of 2.5x. For certain UDF scenarios, computation offloading performs worse by up to a factor of 3x: our microbenchmarks show that 80% of the time is spent on serialization operations. By employing data exchange without serialization, computation offloading achieves performance improvements by up to 10x.

Gavriilidis, Haralampos (2019): Computation Offloading in JVM-based Dataflow Engines. BTW 2019 – Workshopband. DOI: 10.18420/btw2019-ws-20. Gesellschaft für Informatik, Bonn. PISSN: 1617-5468. ISBN: 978-3-88579-684-8. pp. 195-204. Studierendenprogramm. Rostock. 4.-8. März 2019

Schlagwörter

dataflow engines , computation offloading , data exchange , native execution

DOI

10.18420/btw2019-ws-20

Sammlungen

P290 - BTW2019 - Datenbanksysteme für Business, Technologie und Web - Workshopband

Komplettanzeige

Computation Offloading in JVM-based Dataflow Engines

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen