Konferenzbeitrag
Exploring Existing Tools for Managing Different Types of Research Data
Lade...
Volltext URI
Dokumententyp
Text/Conference Paper
Zusatzinformation
Datum
2024
Autor:innen
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Quelle
Verlag
Gesellschaft für Informatik e.V.
Zusammenfassung
Data management is important for the reproducibility of scientific research. One important aspect of data management is version control. In software development, version control tools like Git are commonly used to track source code changes and releases, reproduce earlier versions, find defects, and simplify their repair. In scientific research, scientists often have to manage large amounts of data, while also trying to achieve reproducibility of results and wanting to identify and repair defects in the data. Version control software like Git is specialized for managing source code and other textual files, making it often unsuitable for managing other types of data. This creates a need for version control tools specialized for dealing with research data. This paper establishes requirements for version control tools for research data and evaluates Git Large File Storage, Neptune, Pachyderm, DVC, and Snowflake according to those requirements. We found that none of the evaluated tools fulfill all of our requirements, but we still recommend DVC, Git LFS, and Pachyderm for the use cases they do support.