Auflistung nach Schlagwort "Provenance"
1 - 4 von 4
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragMLProvLab: Provenance Management for Data Science Notebooks(BTW 2023, 2023) Kerzel, Dominik; König-Ries, Birgitta; Sheeba, SamuelComputational notebooks are a form of computational narrative fostering reproducibility.They provide an interactive computing environment where users can run and modify code, and repeat the exploration, providing an iterative communication between data scientists and code. While the ability to execute notebooks non-linearly benefits data scientists for exploration, the drawback is, that it is possible to lose control over the datasets, variables, and methods defined in the notebook and their dependencies.Thus, in this process of user interaction and exploration, there can be a loss of execution history information. To prevent this, a possibility is needed to maintain provenance information. Provenance plays a significant role in data science, especially facilitating the reproducibility of results.To this end, we developed a provenance management tool to help data scientists track, capture, compare, and visualize provenance information in notebook code environments.We conducted an evaluation with data scientists, where participants were asked to find specific provenance information from the execution history of a machine learning Jupyter notebook.The results from the performance and user evaluation show promising aspects of provenance management features of the tool.The resulting system, MLProvLab, is available as an open-source extension for JupyterLab.
- KonferenzbeitragReStoRunT: Simple Recording, Storing, Running and Tracing changes in Spreadsheets(BTW 2023, 2023) Wolfgang, Müller; Mertová, LukréciaIn addition to the ubiquitous big data, one key challenge indata processing and management in the life sciences is the diversity ofsmall data. Diverse pieces of small data have to be transformed intostandards-compliant data. Here, the challenge lies not in the difficulty ofsingle steps that need to be performed, but rather in the fact that manytransformation tasks are to be performed once or only a few times. Thislimits the time that can be put into automated approaches, which inturn severely limits the verifiability of such transformations.As much of the data to be processed is stored in spreadsheets, withinthis paper we justify and propose a lightweight recording-based solutionthat works on a wide variety of spreadsheet programs, from MicrosoftExcel to Google Docs.
- ZeitschriftenartikelScientific Workflows and Provenance: Introduction and Research Opportunities(Datenbank-Spektrum: Vol. 12, No. 3, 2012) Cuevas-Vicenttín, Víctor; Dey, Saumen; Köhler, Sven; Riddle, Sean; Ludäscher, BertramScientific workflows are becoming increasingly popular for compute-intensive and data-intensive scientific applications. The vision and promise of scientific workflows includes rapid, easy workflow design, reuse, scalable execution, and other advantages, e.g., to facilitate “reproducible science” through provenance (e.g., data lineage) support. However, as described in the paper, important research challenges remain. While the database community has studied (business) workflow technologies extensively in the past, most current work in scientific workflows seems to be done outside of the database community, e.g., by practitioners and researchers in the computational sciences and eScience. We provide a brief introduction to scientific workflows and provenance, and identify areas and problems that suggest new opportunities for database research.
- TextdokumentTracing the History of the Baltic Sea Oxygen Level(BTW 2021, 2021) Auge, Tanja; Heuer, AndreasIn order to guarantee the reproducibility of research results, large research communities, conferences and journals increasingly demand the provision of original research data. Since this is often not possible or desired, a certain tact and sensitivity is needed. With our method, combining provenance and evolution, we can identify the source tuples necessary for the reconstruction of a query result also in temporal databases. To avoid dirty data caused by the inverse evolution, we introduced the what-provenance, which remembers the data types of the source relation.