Bug Detection and Localization using Pre-trained Code Language Models

Campos, Viola

Bug Detection and Localization using Pre-trained Code Language Models

dc.contributor.author	Campos, Viola
dc.contributor.editor	Klein, Maike
dc.contributor.editor	Krupka, Daniel
dc.contributor.editor	Winter, Cornelia
dc.contributor.editor	Gergeleit, Martin
dc.contributor.editor	Martin, Ludger
dc.date.accessioned	2024-10-21T18:24:13Z
dc.date.available	2024-10-21T18:24:13Z
dc.date.issued	2024
dc.description.abstract	Language models for source code have improved significantly with the emergence of Transformer-based Large Language Models (LLMs). These models are trained on large amounts of code in which defects are relatively rare, causing them to perceive faulty code as unlikely and correct code as more 'natural,' thus assigning it a higher likelihood. We hypothesize that the likelihood scores generated by an LLM can be directly used as a lightweight approach to detect and localize bugs in source code. In this study, we evaluate various methods to construct a suspiciousness score for faulty code segments based on LLM likelihoods. Our results demonstrate that these methods can detect buggy methods in a common benchmark with up to 78% accuracy. However, using LLMs directly for fault localization raises concerns about training data leakage, as common benchmarks are often already incorporated into the training data of such models and thus learned. By additionally evaluating our experiments on a small, non-public dataset of student submissions to programming exercises, we show that leakage is indeed an issue, as the evaluation results on both datasets differ significantly.	en
dc.identifier.doi	10.18420/inf2024_124
dc.identifier.eissn	2944-7682
dc.identifier.isbn	978-3-88579-746-3
dc.identifier.issn	2944-7682
dc.identifier.pissn	1617-5468
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/45097
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik e.V.
dc.relation.ispartof	INFORMATIK 2024
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-352
dc.subject	Fault Detection
dc.subject	Fault Localization
dc.subject	AI4SE
dc.subject	LLM4SE
dc.title	Bug Detection and Localization using Pre-trained Code Language Models	en
dc.type	Text/Conference Paper
gi.citation.endPage	1429
gi.citation.publisherPlace	Bonn
gi.citation.startPage	1419
gi.conference.date	24.-26. September 2024
gi.conference.location	Wiesbaden
gi.conference.sessiontitle	AI@WORK

Dateien

Originalbündel

1 - 1 von 1

Name:: Campos_Bug_Detection_and_Localization.pdf
Größe:: 825.65 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P352 - INFORMATIK 2024 - Lock in or log out? Wie digitale Souveränität gelingt