Logo des Repositoriums
 

MLProvLab: Provenance Management for Data Science Notebooks

dc.contributor.authorKerzel, Dominik
dc.contributor.authorKönig-Ries, Birgitta
dc.contributor.authorSheeba, Samuel
dc.contributor.editorKönig-Ries, Birgitta
dc.contributor.editorScherzinger, Stefanie
dc.contributor.editorLehner, Wolfgang
dc.contributor.editorVossen, Gottfried
dc.date.accessioned2023-02-23T14:00:15Z
dc.date.available2023-02-23T14:00:15Z
dc.date.issued2023
dc.description.abstractComputational notebooks are a form of computational narrative fostering reproducibility.They provide an interactive computing environment where users can run and modify code, and repeat the exploration, providing an iterative communication between data scientists and code. While the ability to execute notebooks non-linearly benefits data scientists for exploration, the drawback is, that it is possible to lose control over the datasets, variables, and methods defined in the notebook and their dependencies.Thus, in this process of user interaction and exploration, there can be a loss of execution history information. To prevent this, a possibility is needed to maintain provenance information. Provenance plays a significant role in data science, especially facilitating the reproducibility of results.To this end, we developed a provenance management tool to help data scientists track, capture, compare, and visualize provenance information in notebook code environments.We conducted an evaluation with data scientists, where participants were asked to find specific provenance information from the execution history of a machine learning Jupyter notebook.The results from the performance and user evaluation show promising aspects of provenance management features of the tool.The resulting system, MLProvLab, is available as an open-source extension for JupyterLab.en
dc.identifier.doi10.18420/BTW2023-66
dc.identifier.isbn978-3-88579-725-8
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/40375
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofBTW 2023
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-331
dc.subjectData Science
dc.subjectInformation Extraction
dc.subjectProvenance
dc.subjectJupyter Notebook
dc.subjectReproducibility
dc.titleMLProvLab: Provenance Management for Data Science Notebooksen
dc.typeText/Conference Paper
gi.citation.endPage980
gi.citation.publisherPlaceBonn
gi.citation.startPage965
gi.conference.date06.-10. März 2023
gi.conference.locationDresden, Germany

Dateien

Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
C3-06.pdf
Größe:
997.33 KB
Format:
Adobe Portable Document Format