Applying Machine Learning Models to Scalable DataFrames with Grizzly

Kläbe, Steffen; Hagedorn, Stefan

Applying Machine Learning Models to Scalable DataFrames with Grizzly

dc.contributor.author	Kläbe, Steffen
dc.contributor.author	Hagedorn, Stefan
dc.contributor.editor	Kai-Uwe Sattler
dc.contributor.editor	Melanie Herschel
dc.contributor.editor	Wolfgang Lehner
dc.date.accessioned	2021-03-16T07:57:09Z
dc.date.available	2021-03-16T07:57:09Z
dc.date.issued	2021
dc.description.abstract	The popular Python Pandas framework provides an easy-to-use DataFrame API that enables a broad range of users to analyze their data. However, Pandas faces severe scalability issues in terms of runtime and memory consumption, limiting the usability of the framework. In this paper we present Grizzly, a replacement for Python Pandas. Instead of bringing data to the operators like Pandas, Grizzly ships program complexity to database systems by transpiling the DataFrame API to SQL code. Additionally, Grizzly offers user-friendly support for combining different data sources, user-defined functions, and applying Machine Learning models directly inside the database system. Our evaluation shows that Grizzly significantly outperforms Pandas as well as state-of-the-art frameworks for distributed Python processing in several use cases.	en
dc.identifier.doi	10.18420/btw2021-10
dc.identifier.isbn	978-3-88579-705-0
dc.identifier.pissn	1617-5468
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/35793
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik, Bonn
dc.relation.ispartof	BTW 2021
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-311
dc.title	Applying Machine Learning Models to Scalable DataFrames with Grizzly	en
gi.citation.endPage	214
gi.citation.startPage	195
gi.conference.date	13.-17. September 2021
gi.conference.location	Dresden
gi.conference.sessiontitle	ML & Data Science

Dateien

Originalbündel

1 - 1 von 1

Name:: A2-4.pdf
Größe:: 625.19 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P311 - BTW2021- Datenbanksysteme für Business, Technologie und Web