Auflistung nach Schlagwort "Reproducibility"
1 - 9 von 9
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelA Link is not Enough – Reproducibility of Data(Datenbank-Spektrum: Vol. 19, No. 2, 2019) Pawlik, Mateusz; Hütter, Thomas; Kocher, Daniel; Mann, Willi; Augsten, NikolausAlthough many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process. We showcase that the details of the data preparation process matter and subtle differences during data conversion can have a large impact on the outcome of runtime results. We introduce a data reproducibility model, identify three levels of data reproducibility, report about our own experience, and exemplify our best practices.
- KonferenzbeitragCommunity Expectations for Research Artifacts and Evaluation Processes(Software Engineering 2023, 2023) Hermann, Ben; Winter, Stefan; Siegmund, JanetArtifact evaluation has been introduced into the software engineering and programming languages research community with a pilot at ESEC/FSE 2011 and has since then enjoyed a healthy adoption throughout the conference landscape. We conducted a survey including all members of artifact evaluation committees of major conferences in the software engineering and programming language field from 2011 to 2019 and compared the answers to expectations set by calls for artifacts and reviewing guidelines. While we find that some expectations exceed the ones expressed in calls and reviewing guidelines, there is no consensus on a quality threshold for artifacts in general. We observe very specific quality expectations for specific artifact types for review and later usage, but also a lack of their communication in calls. We also find problematic inconsistencies in the terminology used to express artifact evaluation’s most important purpose. We derive several actionable suggestions which can help to mature artifact evaluation in the inspected community and also to aid its introduction into other communities in computer science.
- ZeitschriftenartikelEvaluation Infrastructures for Academic Shared Tasks(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Schaible, Johann; Breuer, Timo; Tavakolpoursaleh, Narges; Müller, Bernd; Wolff, Benjamin; Schaer, PhilippAcademic search systems aid users in finding information covering specific topics of scientific interest and have evolved from early catalog-based library systems to modern web-scale systems. However, evaluating the performance of the underlying retrieval approaches remains a challenge. An increasing amount of requirements for producing accurate retrieval results have to be considered, e.g., close integration of the system’s users. Due to these requirements, small to mid-size academic search systems cannot evaluate their retrieval system in-house. Evaluation infrastructures for shared tasks alleviate this situation. They allow researchers to experiment with retrieval approaches in specific search and recommendation scenarios without building their own infrastructure. In this paper, we elaborate on the benefits and shortcomings of four state-of-the-art evaluation infrastructures on search and recommendation tasks concerning the following requirements: support for online and offline evaluations, domain specificity of shared tasks, and reproducibility of experiments and results. In addition, we introduce an evaluation infrastructure concept design aiming at reducing the shortcomings in shared tasks for search and recommender systems.
- KonferenzbeitragExploring Existing Tools for Managing Different Types of Research Data(INFORMATIK 2024, 2024) Freund, Adrian; Hajiabadi, Hamideh; Koziolek, AnneData management is important for the reproducibility of scientific research. One important aspect of data management is version control. In software development, version control tools like Git are commonly used to track source code changes and releases, reproduce earlier versions, find defects, and simplify their repair. In scientific research, scientists often have to manage large amounts of data, while also trying to achieve reproducibility of results and wanting to identify and repair defects in the data. Version control software like Git is specialized for managing source code and other textual files, making it often unsuitable for managing other types of data. This creates a need for version control tools specialized for dealing with research data. This paper establishes requirements for version control tools for research data and evaluates Git Large File Storage, Neptune, Pachyderm, DVC, and Snowflake according to those requirements. We found that none of the evaluated tools fulfill all of our requirements, but we still recommend DVC, Git LFS, and Pachyderm for the use cases they do support.
- ZeitschriftenartikelMapping platforms into a new open science model for machine learning(it - Information Technology: Vol. 61, No. 4, 2019) Weißgerber, Thomas; Granitzer, MichaelData-centric disciplines like machine learning and data science have become major research areas within computer science and beyond. However, the development of research processes and tools did not keep pace with the rapid advancement of the disciplines, resulting in several insufficiently tackled challenges to attain reproducibility, replicability, and comparability of achieved results. In this discussion paper, we review existing tools, platforms and standardization efforts for addressing these challenges. As a common ground for our analysis, we develop an open science centred process model for machine learning research, which combines openness and transparency with the core processes of machine learning and data science. Based on the features of over 40 tools, platforms and standards, we list the, in our opinion, 11 most central platforms for the research process in this paper. We conclude that most platforms cover only parts of the requirements for overcoming the identified challenges.
- KonferenzbeitragMLProvCodeGen: A Tool for Provenance Data Input and Capture of Customizable Machine Learning Scripts(BTW 2023, 2023) Mustafa, Tarek Al; König-Ries, Birgitta; Samuel, SheebaOver the last decade Machine learning (ML) has dramatically changed the application ofand research in computer science. It becomes increasingly complicated to assure the transparency and reproducibility of advanced ML systems from raw data to deployment. In this paper, we describe an approach to supply users with an interface to specify a variety of parameters that together provide complete provenance information and automatically generate executable ML code from this information. We introduce MLProvCodeGen (Machine Learning Provenance Code Generator), a JupyterLab extension to generate custom code for ML experiments from user-defined metadata. ML workflows can be generated with different data settings, model parameters, methods, and trainingparameters and reproduce results in Jupyter Notebooks. We evaluated our approach with two ML applications, image and multiclass classification, and conducted a user evaluation.
- KonferenzbeitragMLProvLab: Provenance Management for Data Science Notebooks(BTW 2023, 2023) Kerzel, Dominik; König-Ries, Birgitta; Sheeba, SamuelComputational notebooks are a form of computational narrative fostering reproducibility.They provide an interactive computing environment where users can run and modify code, and repeat the exploration, providing an iterative communication between data scientists and code. While the ability to execute notebooks non-linearly benefits data scientists for exploration, the drawback is, that it is possible to lose control over the datasets, variables, and methods defined in the notebook and their dependencies.Thus, in this process of user interaction and exploration, there can be a loss of execution history information. To prevent this, a possibility is needed to maintain provenance information. Provenance plays a significant role in data science, especially facilitating the reproducibility of results.To this end, we developed a provenance management tool to help data scientists track, capture, compare, and visualize provenance information in notebook code environments.We conducted an evaluation with data scientists, where participants were asked to find specific provenance information from the execution history of a machine learning Jupyter notebook.The results from the performance and user evaluation show promising aspects of provenance management features of the tool.The resulting system, MLProvLab, is available as an open-source extension for JupyterLab.
- KonferenzbeitragA Provenance Management Framework for Knowledge Graph Generation in a Web Portal(BTW 2023, 2023) Kleinsteuber, Erik; Babalou, Samira; König-Ries, BirgittaKnowledge Graphs (KGs) are the semantic backbone for a wide variety of applications in different domains. In recent years, different web portals providing relevant functionalities for managing KGs have been proposed. An important functionality such portals is provenance data management of KG generation process. Capturing, storing, and accessing provenance data efficiently are complex problems. Solutions to these problems vary widely depending on many factors like the computational environment, computational methods, desired provenance granularity, and much more. In this paper, we present one possible solution: a new framework to capture coarse-grained workflow provenance of KGs during creation in a web portal. We capture the necessary information of the KG generation process; store and retrieve the provenance data using standard functionalities of relational databases. Our captured workflow can be rerun over the same or different input source data. With this, the framework can support four different applications of provenance data: (i) reproduce the KG, (ii) create a new KG with an existing workflow, (iii) undo the executed tools and adapt the provenance data accordingly, and (iv) retrieve the provenance data of a KG.
- KonferenzbeitragReproducing Taint-Analysis Results with ReproDroid(Software Engineering 2020, 2020) Pauck, Felix; Bodden, Eric; Wehrheim, HeikeMore and more Android taint-analysis tools appear each year. Any paper proposing such a tool typically comes with an in-depth evaluation of its supported features, accuracy and ability to be applied on real-world apps. Although the authors spent a lot of effort to come up with these evaluations, comparability is often hindered since the description of their experimental targets is usually limited. To conduct a comparable, automatic and unbiased evaluation of different analysis tools, we propose the framework ReproDroid. The framework enables us to precisely declare our evaluation targets, in consequence we refine three well-known benchmarks: DroidBench, ICCBench and DIALDroidBench. Furthermore, we instantiate this framework for six prominent taint-analysis tools, namely Amandroid, DIALDroid, DidFail, DroidSafe, FlowDroid and IccTA. Finally, we use these instances to automatically check whether different promises commonly made in the associated proposing papers are kept.