Logo des Repositoriums
 
Textdokument

Ongoing Automated Data Set Generation for Vulnerability Prediction from Github Data

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Zusatzinformation

Datum

2022

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik, Bonn

Zusammenfassung

This paper describes the development of a continuous github repository analysis pipeline with the focus on creating a data set for vulnerability prediction in source code. Currently, used data sets consist only of source code functions or methods without additional meta information. This paper assumes that the surrounding code of vulnerable functions can be beneficial to the detection rate. In order to test this assumption, large data sets are needed that can be created using the proposed pipeline. Although the pipeline requires some improvements, in a first test run 1.5 million repositories could be analyzed and evaluated. The resulting data set will be published in the future.

Beschreibung

Hinrichs, Torge (2022): Ongoing Automated Data Set Generation for Vulnerability Prediction from Github Data. GI SICHERHEIT 2022. DOI: 10.18420/sicherheit2022_17. Gesellschaft für Informatik, Bonn. PISSN: 1617-5468. ISBN: 978-3-88579-717-3. pp. 219-222. Practitioners Track. Karlsruhe. 5.-8. April 2022

Zitierform

Tags