Logo des Repositoriums
 
Textdokument

Applying Machine Learning Models to Scalable DataFrames with Grizzly

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Zusatzinformation

Datum

2021

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Gesellschaft für Informatik, Bonn

Zusammenfassung

The popular Python Pandas framework provides an easy-to-use DataFrame API that enables a broad range of users to analyze their data. However, Pandas faces severe scalability issues in terms of runtime and memory consumption, limiting the usability of the framework. In this paper we present Grizzly, a replacement for Python Pandas. Instead of bringing data to the operators like Pandas, Grizzly ships program complexity to database systems by transpiling the DataFrame API to SQL code. Additionally, Grizzly offers user-friendly support for combining different data sources, user-defined functions, and applying Machine Learning models directly inside the database system. Our evaluation shows that Grizzly significantly outperforms Pandas as well as state-of-the-art frameworks for distributed Python processing in several use cases.

Beschreibung

Kläbe, Steffen; Hagedorn, Stefan (2021): Applying Machine Learning Models to Scalable DataFrames with Grizzly. BTW 2021. DOI: 10.18420/btw2021-10. Gesellschaft für Informatik, Bonn. PISSN: 1617-5468. ISBN: 978-3-88579-705-0. pp. 195-214. ML & Data Science. Dresden. 13.-17. September 2021

Schlagwörter

Zitierform

Tags