Logo des Repositoriums
 

Applying Machine Learning Models to Scalable DataFrames with Grizzly

dc.contributor.authorKläbe, Steffen
dc.contributor.authorHagedorn, Stefan
dc.contributor.editorKai-Uwe Sattler
dc.contributor.editorMelanie Herschel
dc.contributor.editorWolfgang Lehner
dc.date.accessioned2021-03-16T07:57:09Z
dc.date.available2021-03-16T07:57:09Z
dc.date.issued2021
dc.description.abstractThe popular Python Pandas framework provides an easy-to-use DataFrame API that enables a broad range of users to analyze their data. However, Pandas faces severe scalability issues in terms of runtime and memory consumption, limiting the usability of the framework. In this paper we present Grizzly, a replacement for Python Pandas. Instead of bringing data to the operators like Pandas, Grizzly ships program complexity to database systems by transpiling the DataFrame API to SQL code. Additionally, Grizzly offers user-friendly support for combining different data sources, user-defined functions, and applying Machine Learning models directly inside the database system. Our evaluation shows that Grizzly significantly outperforms Pandas as well as state-of-the-art frameworks for distributed Python processing in several use cases.en
dc.identifier.doi10.18420/btw2021-10
dc.identifier.isbn978-3-88579-705-0
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/35793
dc.language.isoen
dc.publisherGesellschaft für Informatik, Bonn
dc.relation.ispartofBTW 2021
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-311
dc.titleApplying Machine Learning Models to Scalable DataFrames with Grizzlyen
gi.citation.endPage214
gi.citation.startPage195
gi.conference.date13.-17. September 2021
gi.conference.locationDresden
gi.conference.sessiontitleML & Data Science

Dateien

Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
A2-4.pdf
Größe:
625.19 KB
Format:
Adobe Portable Document Format