Logo des Repositoriums
 

Analyzing the reproducibility of research-related Jupyter notebooks at scale

dc.contributor.authorMietchen, Daniel
dc.contributor.authorSamuel, Sheeba
dc.contributor.editorKoziolek, Anne
dc.contributor.editorLamprecht, Anna-Lena
dc.contributor.editorThüm, Thomas
dc.contributor.editorBurger, Erik
dc.date.accessioned2025-02-14T09:36:31Z
dc.date.available2025-02-14T09:36:31Z
dc.date.issued2025
dc.description.abstractWe address computational reproducibility of publication-associated Jupyter notebooks at 3 levels: (i) using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks associated with publications indexed in the biomedical literature repository PubMed Central. We identified such notebooks by mining the article’s full text, trying to locate them on GitHub, and attempting to rerun them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. (ii) This study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over the course of 2 years, during which the corpus of Jupyter notebooks from articles indexed in PubMed Central has grown in a highly dynamic fashion. (iii) We imported the corpus into a knowledge graph with a public SPARQL endpoint that allows for fine-grained exploration of notebooks individually or in aggregation (e.g. by topic, by journal or by error type). In this talk, we zoom in on common problems and practices, highlight trends, and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.en
dc.identifier.doi10.18420/se2025-07
dc.identifier.eissn2944-7682
dc.identifier.issn2944-7682
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/45813
dc.language.isoen
dc.publisherGesellschaft für Informatik, Bonn
dc.relation.ispartofSoftware Engineering 2025
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-360
dc.subjectKnowledge Graph
dc.subjectComputational reproducibility
dc.subjectJupyter notebooks
dc.subjectFAIR data
dc.subjectPubMed Central
dc.subjectGitHub
dc.subjectPython
dc.subjectSPARQL
dc.titleAnalyzing the reproducibility of research-related Jupyter notebooks at scaleen
mci.conference.date22.-28. Februar 2025
mci.conference.locationKarlsruhe
mci.conference.sessiontitleScientific Programme
mci.reference.pages39-40

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
B2-1.pdf
Größe:
79.42 KB
Format:
Adobe Portable Document Format