Auflistung nach Autor:in "Stoev, Teodor"
1 - 3 von 3
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragAssessing Large Language Models for annotating data in Dementia-Related texts: A Comparative Study with Human Annotators(INFORMATIK 2024, 2024) Suravee, Sumaiya; Stoev, Teodor; Konow, Sara; Yordanova, KristinaAs the aging population grows, the incidence of dementia is rising sharply, necessitating the extraction of domain-specific information from texts to gain valuable insights into the condition. Training Natural Language Processing (NLP) models for this purpose requires substantial amounts of annotated data, which is typically produced by human annotators. While human annotation is precise, it is also labor-intensive and costly. Large Language Models (LLMs) present a promising alternative that could potentially streamline and economize the annotation process. However, LLMs may struggle with complex, domain-specific contexts, potentially leading to inaccuracies. This paper investigates the effectiveness of LLMs in annotating words and phrases in ambiguous dementia-related texts by comparing LLM-generated annotations with those produced by human annotators. We followed a specific annotation scheme and had both the LLM and human raters annotate a corpus of informal texts from forums of family carers of people with dementia. The results indicate a moderate overlap in inter-rater agreement between LLM and expert annotators, with the LLM identifying nearly twice as many instances as the human raters. Although LLMs can partially automate the annotation process, they are not yet fully reliable for complex domains. By refining LLM-generated data through expert review, it is possible to reduce the burden on human raters and accelerate the creation of annotated datasets.
- KonferenzbeitragA Comparative Analysis on Machine Learning Techniques for Research Metadata: the ARDUOUS Case Study(INFORMATIK 2024, 2024) Yadav, Dipendra; Tonkin, Emma; Stoev, Teodor; Yordanova, KristinaThe rapid increase in research publications necessitates effective methods for organizing and analyzing large volumes of textual data. This study evaluates various combinations of embedding models, dimensionality reduction techniques, and clustering algorithms applied to metadata from papers accepted at the ARDUOUS (Annotation of useR Data for UbiquitOUs Systems) workshop over a period of 7 years. The analysis encompasses different types of keywords, including All Keywords (a comprehensive set of all extracted keywords), Multi-word Keywords (phrases consisting of two or more words), Existing Keywords (keywords already present in the metadata), and Single-word Keywords (individual words). The study found that the highest silhouette scores were achieved with 3, 4, and 5 clusters across all keyword types. Principal Component Analysis (PCA) and Independent Component Analysis (ICA) were identified as the most effective dimensionality reduction techniques, while DistilBERT embeddings consistently yielded high scores. Clustering algorithms such as k-means, k-medoids, and Gaussian Mixture Models (GMM) demonstrated robustness in forming well-defined clusters. These findings provide valuable insights into the main topics covered in the workshop papers and suggest optimal methodologies for analyzing research metadata, thereby enhancing the understanding of semantic relationships in textual data.
- KonferenzbeitragVariability of annotations over time: An experimental study in the dementia-related named entity recognition domain(INFORMATIK 2024, 2024) Stoev, Teodor; Suravee, Sumaiya; Yordanova, KristinaData annotation is a crucial step in various domains where Machine Learning (ML) approaches are utilized. Despite the availability of automated and semi-automated data labeling methods, manual annotation by experts remains essential for developing high-quality models in certain scenarios. This study explores how annotations can evolve over time through an experiment focused on annotating named entities and relationships within the domain of dementia and related behaviors. Two annotators labeled a task-specific text corpus on two separate occasions, one year apart. Our findings revealed an increase in both the quantity and quality of annotated entities and relationships for both annotators. Statistical tests were conducted to assess the significance of the changes in annotations. The results indicate substantial variability in annotations over time, particularly in complex domains. The paper also discusses potential reasons for these variations.