Assessing Large Language Models for annotating data in Dementia-Related texts: A Comparative Study with Human Annotators

Suravee, Sumaiya; Stoev, Teodor; Konow, Sara; Yordanova, Kristina

Assessing Large Language Models for annotating data in Dementia-Related texts: A Comparative Study with Human Annotators

dc.contributor.author	Suravee, Sumaiya
dc.contributor.author	Stoev, Teodor
dc.contributor.author	Konow, Sara
dc.contributor.author	Yordanova, Kristina
dc.contributor.editor	Klein, Maike
dc.contributor.editor	Krupka, Daniel
dc.contributor.editor	Winter, Cornelia
dc.contributor.editor	Gergeleit, Martin
dc.contributor.editor	Martin, Ludger
dc.date.accessioned	2024-10-21T18:24:26Z
dc.date.available	2024-10-21T18:24:26Z
dc.date.issued	2024
dc.description.abstract	As the aging population grows, the incidence of dementia is rising sharply, necessitating the extraction of domain-specific information from texts to gain valuable insights into the condition. Training Natural Language Processing (NLP) models for this purpose requires substantial amounts of annotated data, which is typically produced by human annotators. While human annotation is precise, it is also labor-intensive and costly. Large Language Models (LLMs) present a promising alternative that could potentially streamline and economize the annotation process. However, LLMs may struggle with complex, domain-specific contexts, potentially leading to inaccuracies. This paper investigates the effectiveness of LLMs in annotating words and phrases in ambiguous dementia-related texts by comparing LLM-generated annotations with those produced by human annotators. We followed a specific annotation scheme and had both the LLM and human raters annotate a corpus of informal texts from forums of family carers of people with dementia. The results indicate a moderate overlap in inter-rater agreement between LLM and expert annotators, with the LLM identifying nearly twice as many instances as the human raters. Although LLMs can partially automate the annotation process, they are not yet fully reliable for complex domains. By refining LLM-generated data through expert review, it is possible to reduce the burden on human raters and accelerate the creation of annotated datasets.	en
dc.identifier.doi	10.18420/inf2024_36
dc.identifier.eissn	2944-7682
dc.identifier.isbn	978-3-88579-746-3
dc.identifier.issn	2944-7682
dc.identifier.pissn	1617-5468
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/45195
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik e.V.
dc.relation.ispartof	INFORMATIK 2024
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-352
dc.subject	Data Annotation
dc.subject	Large Language Model
dc.subject	People with Dementia
dc.subject	Named Entity Recognition
dc.title	Assessing Large Language Models for annotating data in Dementia-Related texts: A Comparative Study with Human Annotators	en
dc.type	Text/Conference Paper
gi.citation.endPage	498
gi.citation.publisherPlace	Bonn
gi.citation.startPage	487
gi.conference.date	24.-26. September 2024
gi.conference.location	Wiesbaden
gi.conference.sessiontitle	8th International Workshop on Annotation of useR Data for UbiquitOUs Systems

Dateien

Originalbündel

1 - 1 von 1

Name:: Suravee_et_al_Assessing_Large_Language_Models.pdf
Größe:: 481.64 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P352 - INFORMATIK 2024 - Lock in or log out? Wie digitale Souveränität gelingt