Computer-Assisted Short Answer Grading Using Large Language Models and Rubrics

Metzler, Tim; Plöger, Paul G.; Hees, Jörn

Computer-Assisted Short Answer Grading Using Large Language Models and Rubrics

dc.contributor.author	Metzler, Tim
dc.contributor.author	Plöger, Paul G.
dc.contributor.author	Hees, Jörn
dc.contributor.editor	Klein, Maike
dc.contributor.editor	Krupka, Daniel
dc.contributor.editor	Winter, Cornelia
dc.contributor.editor	Gergeleit, Martin
dc.contributor.editor	Martin, Ludger
dc.date.accessioned	2024-10-21T18:24:13Z
dc.date.available	2024-10-21T18:24:13Z
dc.date.issued	2024
dc.description.abstract	Grading student answers and providing feedback are essential yet time-consuming tasks for educators. Recent advancements in Large Language Models (LLMs), including ChatGPT, Llama, and Mistral, have paved the way for automated support in this domain. This paper investigates the efficacy of instruction-following LLMs in adhering to predefined rubrics for evaluating student answers and delivering meaningful feedback. Leveraging the Mohler dataset and a custom German dataset, we evaluate various models, from commercial ones like ChatGPT to smaller open-source options like Llama, Mistral, and Command R. Additionally, we explore the impact of temperature parameters and techniques such as few-shot prompting. Surprisingly, while few-shot prompting enhances grading accuracy closer to ground truth, it introduces model inconsistency. Furthermore, some models exhibit non-deterministic behavior even at near-zero temperature settings. Our findings highlight the importance of rubrics in enhancing the interpretability of model outputs and fostering consistency in grading practices.	en
dc.identifier.doi	10.18420/inf2024_121
dc.identifier.isbn	978-3-88579-746-3
dc.identifier.pissn	1617-5468
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/45094
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik e.V.
dc.relation.ispartof	INFORMATIK 2024
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-352
dc.subject	Natural Language Processing
dc.subject	Automatic Short Answer Grading
dc.subject	Large Language Models
dc.subject	Rubrics
dc.subject	Mistral
dc.subject	ChatGPT
dc.subject	Llama
dc.title	Computer-Assisted Short Answer Grading Using Large Language Models and Rubrics	en
dc.type	Text/Conference Paper
gi.citation.endPage	1393
gi.citation.publisherPlace	Bonn
gi.citation.startPage	1383
gi.conference.date	24.-26. September 2024
gi.conference.location	Wiesbaden
gi.conference.sessiontitle	AI@WORK

Dateien

Originalbündel

1 - 1 von 1

Name:: Metzler_et_al_Computer-Assisted_Short_Answer_Grading.pdf
Größe:: 441.38 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P352 - INFORMATIK 2024 - Lock in or log out? Wie digitale Souveränität gelingt