Using Pre-trained Transformers to Detect Malicious Source Code Within JavaScript Packages

Ohm, MarcGötz, AnjaKlein, MaikeKrupka, DanielWinter, CorneliaGergeleit, MartinMartin, Ludger2024-10-212024-10-212024978-3-88579-746-32944-7682https://dl.gi.de/handle/20.500.12116/45200The proliferation of open source software reuse has led to a significant increase in software supply chain attacks, making it increasingly challenging to identify malicious packages amidst the sheer volume of available packages. Traditional static analysis methods often fall short in detecting these threats due to the complexity and diversity of code semantics. This paper addresses these challenges by leveraging the remarkable success of transformer models in understanding code semantics. We propose a novel approach that utilizes pre-trained transformer models to embed source code, followed by training classifiers on these embeddings. This methodology enables a more nuanced understanding of code semantics, significantly improving the detection of malicious packages. Through extensive experiments, our approach achieves F1-scores as high as 0.98 and an alert rate of 0.09%, demonstrating its effectiveness in detecting malicious code within open source software supply chains.enTransformersMalicious PackagesSoftware Supply ChainUsing Pre-trained Transformers to Detect Malicious Source Code Within JavaScript PackagesText/Conference Paper10.18420/inf2024_401617-54682944-7682