Applying Machine Learning Models to Scalable DataFrames with Grizzly
dc.contributor.author | Kläbe, Steffen | |
dc.contributor.author | Hagedorn, Stefan | |
dc.contributor.editor | Kai-Uwe Sattler | |
dc.contributor.editor | Melanie Herschel | |
dc.contributor.editor | Wolfgang Lehner | |
dc.date.accessioned | 2021-03-16T07:57:09Z | |
dc.date.available | 2021-03-16T07:57:09Z | |
dc.date.issued | 2021 | |
dc.description.abstract | The popular Python Pandas framework provides an easy-to-use DataFrame API that enables a broad range of users to analyze their data. However, Pandas faces severe scalability issues in terms of runtime and memory consumption, limiting the usability of the framework. In this paper we present Grizzly, a replacement for Python Pandas. Instead of bringing data to the operators like Pandas, Grizzly ships program complexity to database systems by transpiling the DataFrame API to SQL code. Additionally, Grizzly offers user-friendly support for combining different data sources, user-defined functions, and applying Machine Learning models directly inside the database system. Our evaluation shows that Grizzly significantly outperforms Pandas as well as state-of-the-art frameworks for distributed Python processing in several use cases. | en |
dc.identifier.doi | 10.18420/btw2021-10 | |
dc.identifier.isbn | 978-3-88579-705-0 | |
dc.identifier.pissn | 1617-5468 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/35793 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik, Bonn | |
dc.relation.ispartof | BTW 2021 | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-311 | |
dc.title | Applying Machine Learning Models to Scalable DataFrames with Grizzly | en |
gi.citation.endPage | 214 | |
gi.citation.startPage | 195 | |
gi.conference.date | 13.-17. September 2021 | |
gi.conference.location | Dresden | |
gi.conference.sessiontitle | ML & Data Science |
Dateien
Originalbündel
1 - 1 von 1