Logo des Repositoriums
 
Konferenzbeitrag

Bug Detection and Localization using Pre-trained Code Language Models

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Text/Conference Paper

Zusatzinformation

Datum

2024

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik e.V.

Zusammenfassung

Language models for source code have improved significantly with the emergence of Transformer-based Large Language Models (LLMs). These models are trained on large amounts of code in which defects are relatively rare, causing them to perceive faulty code as unlikely and correct code as more 'natural,' thus assigning it a higher likelihood. We hypothesize that the likelihood scores generated by an LLM can be directly used as a lightweight approach to detect and localize bugs in source code. In this study, we evaluate various methods to construct a suspiciousness score for faulty code segments based on LLM likelihoods. Our results demonstrate that these methods can detect buggy methods in a common benchmark with up to 78% accuracy. However, using LLMs directly for fault localization raises concerns about training data leakage, as common benchmarks are often already incorporated into the training data of such models and thus learned. By additionally evaluating our experiments on a small, non-public dataset of student submissions to programming exercises, we show that leakage is indeed an issue, as the evaluation results on both datasets differ significantly.

Beschreibung

Campos, Viola (2024): Bug Detection and Localization using Pre-trained Code Language Models. INFORMATIK 2024. DOI: 10.18420/inf2024_124. Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 978-3-88579-746-3. pp. 1419-1429. AI@WORK. Wiesbaden. 24.-26. September 2024

Zitierform

Tags