Logo des Repositoriums
 
Konferenzbeitrag

MLProvLab: Provenance Management for Data Science Notebooks

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Text/Conference Paper

Zusatzinformation

Datum

2023

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Gesellschaft für Informatik e.V.

Zusammenfassung

Computational notebooks are a form of computational narrative fostering reproducibility.They provide an interactive computing environment where users can run and modify code, and repeat the exploration, providing an iterative communication between data scientists and code. While the ability to execute notebooks non-linearly benefits data scientists for exploration, the drawback is, that it is possible to lose control over the datasets, variables, and methods defined in the notebook and their dependencies.Thus, in this process of user interaction and exploration, there can be a loss of execution history information. To prevent this, a possibility is needed to maintain provenance information. Provenance plays a significant role in data science, especially facilitating the reproducibility of results.To this end, we developed a provenance management tool to help data scientists track, capture, compare, and visualize provenance information in notebook code environments.We conducted an evaluation with data scientists, where participants were asked to find specific provenance information from the execution history of a machine learning Jupyter notebook.The results from the performance and user evaluation show promising aspects of provenance management features of the tool.The resulting system, MLProvLab, is available as an open-source extension for JupyterLab.

Beschreibung

Kerzel, Dominik; König-Ries, Birgitta; Sheeba, Samuel (2023): MLProvLab: Provenance Management for Data Science Notebooks. BTW 2023. DOI: 10.18420/BTW2023-66. Bonn: Gesellschaft für Informatik e.V.. ISBN: 978-3-88579-725-8. pp. 965-980. Dresden, Germany. 06.-10. März 2023

Zitierform

Tags