Scenario-based Data Set Generation for Use in Digital Forensics: A Case Study

Göbel, ThomasBaier, HaraldWolf, DennisKlein, MaikeKrupka, DanielWinter, CorneliaGergeleit, MartinMartin, Ludger2024-10-212024-10-212024978-3-88579-746-32944-7682https://dl.gi.de/handle/20.500.12116/45183Digital forensics is a rapidly growing and highly relevant field of cybersecurity. In case of an incident, the subsequent digital forensic investigation and analysis shall reveal the respective digital evidence. However, although electronic devices and their data play a central role in each crime investigation, data sets to train experts or to validate tools are sparse. While manual data set generation is a time-consuming, elaborate and error-prone task, tool-based data synthesis is an excellent candidate for simplifying data generation and solving the data set gap problem. Synthetic data sets can be used, for example, to test and refine forensic tools and methods under controlled conditions. In addition, entirely new approaches can be explored. Several promising data synthesis frameworks for digital forensic data set creation have been published lately, the most recent of which is ForTrace, a freely available, community-driven data synthesis framework written in Python for generating digital forensic data sets. This paper shows how to apply ForTrace in a large-scale manner without human interaction. Our main goal is to show the usability of ForTrace and demonstrate its practicality and benefits for the digital forensic domain. We therefore provide a sample usage of ForTrace in two scenarios, namely a VeraCrypt and a malware use case, and present the definition of the corresponding configurations.enDigital forensic data setDigital corporaSynthetic dataGround truth dataLabeled data setData set generationData set creationData synthesis frameworkForTraceScenario-based Data Set Generation for Use in Digital Forensics: A Case StudyText/Conference Paper10.18420/inf2024_251617-54682944-7682