GI Digital Library :: Auflistung nach Schlagwort "Large Language Models"

Auflistung nach Schlagwort "Large Language Models"

1 - 10 von 25

Student Paper
AI-based chatbots as enabler for efficient external knowledge management in public administration
(7. Fachtagung Rechts- und Verwaltungsinformatik (RVI 2024): Neue Wege der Zusammenarbeit und Vernetzung für digitale Transformation und Verwaltungsmodernisierung, 2024) Wiethölter, Jost; Kühl, Linus; Feldmann, Carsten
This study addresses the pressing issue of staff shortages in German public administrations through the lens of digitalization, focusing on the potential of AI-based chatbots to solve this problem by replacing human labour. Employing a Design Science Research Process (DSRP) methodology, the research synthesizes theoretical foundations and regulatory frameworks to develop a robust chatbot concept. The artifact presented is a comprehensive architectural framework integrating user-centric design, linguistic processing, and regulatory compliance. The proposed artifact navigates complex federal structures and diverse IT infrastructures, promoting accessibility and inclusivity. Implications suggest enhanced efficiency and accessibility in public service delivery for potentially increasing citizen satisfaction and decreasing employee workload. The study underscores the importance of legal compliance and the evolving regulatory landscape in AI deployment. Future research will involve prototyping and evaluating the artifact's performance and applicability throughout the course of the DSRP, thus contributing to the advancement of digital transformation in public administrations.
Konferenzbeitrag
Assessing Large Language Models in the Agricultural Sector: A Comprehensive Analysis Utilizing a Novel Synthetic Benchmark Dataset
(INFORMATIK 2024, 2024) Kästing, Marvin; Hänig, Christian
This paper provides a comprehensive study of Large Language Models (LLMs) for question-answering and information retrieval tasks within the agricultural domain. We introduce the novel benchmark dataset BVL QA Corpus 2024 specifically designed to thoroughly evaluate both commercial and non-commercial LLMs in agricultural contexts. Using LLMs, we generate question-answer pairs from paragraphs extracted from domain-specific agricultural documents. Leveraging this newly developed benchmark dataset, we assess a selection of LLMs using standard metrics. Additionally, we develop a prototype Retrieval-Augmented Generation (RAG) system tailored to the agricultural sector. This system is then compared to baseline evaluations to determine the degree of alignment between actual performance and initial upper limit estimations. Our empirical analysis demonstrates that RAG systems outperform baseline LLMs across all metrics.
Konferenzbeitrag
Assessment Power of ChatGPT in the Context of Environmental Compliance Management – Experiments with a Real-World Regulation Cadastre
(EnviroInfo 2023, 2023) Thimm, Heiko
In multiple research disciplines use cases built on Large Language Models in particular ChatGPT are at the centre of today’s discussions. For example, in various ongoing projects of the LegalTech area ChatGPT is evaluated in terms of its potential to replace routine work of lawyers. In a recently started project we are investigating the use of ChatGPT for a specific corporate compliance management task. In particular, based on a real-world test data set ChatGPT is prompted to assess the relevance of environmental regulations. The ChatGPT output is compared to the respective judgements of the human experts in order to obtain a first indication of the assessment power of ChatGPT in the compliance management domain. This research in progress article gives an overview of the evaluation approach and presents first results of a set of 142 test cases covering regulations from four different areas of environmental legislation.
Konferenzbeitrag
Augmentation through Generative AI: Exploring the Effects of Human-AI Interaction and Explainable AI on Service Performance
(Mensch und Computer 2024 - Workshopband, 2024) Reinhard, Philipp
Generative artificial intelligence (GenAI), particularly large language models (LLMs), offer new capabilities of natural language understanding and generation, potentially reducing employee stress and high turnover rates in customer service delivery. However, these systems also present risks, such as generating convincing but erroneous responses, known as hallucinations and confabulations. Thus, this study investigates the impact of GenAI on service performance in customer support settings, emphasizing augmentation over automation to address three key inquiries: identifying patterns of GenAI infusion that alter service routines, assessing the effects of human-AI interaction on cognitive load and task performance, and evaluating the role of explainable AI (XAI) in detecting erroneous responses such as hallucinations. Employing a design science research approach, the study combines literature reviews, expert interviews, and experimental designs to derive implications for designing GenAI-driven augmentation. Preliminary findings reveal three key insights: (1) Service employees play a critical role in retaining organizational knowledge and delegating decisions to GenAI agents; (2) Utilizing GenAI co-pilots significantly reduces the cognitive load during stressful customer interactions; and (3) Novice employees face challenges in discerning accurate AI-generated advice from inaccurate suggestions without additional explanatory context.
Konferenzbeitrag
Computer-Assisted Short Answer Grading Using Large Language Models and Rubrics
(INFORMATIK 2024, 2024) Metzler, Tim; Plöger, Paul G.; Hees, Jörn
Grading student answers and providing feedback are essential yet time-consuming tasks for educators. Recent advancements in Large Language Models (LLMs), including ChatGPT, Llama, and Mistral, have paved the way for automated support in this domain. This paper investigates the efficacy of instruction-following LLMs in adhering to predefined rubrics for evaluating student answers and delivering meaningful feedback. Leveraging the Mohler dataset and a custom German dataset, we evaluate various models, from commercial ones like ChatGPT to smaller open-source options like Llama, Mistral, and Command R. Additionally, we explore the impact of temperature parameters and techniques such as few-shot prompting. Surprisingly, while few-shot prompting enhances grading accuracy closer to ground truth, it introduces model inconsistency. Furthermore, some models exhibit non-deterministic behavior even at near-zero temperature settings. Our findings highlight the importance of rubrics in enhancing the interpretability of model outputs and fostering consistency in grading practices.
Konferenzbeitrag
Engineering A Reliable Prompt For Generating Unit Tests - Prompt engineering for QA & QA for prompt engineering
(Softwaretechnik-Trends Band 43, Heft 3, 2023) Faragó, David
This paper demonstrates Prompt Engineering (PE) on a running example: generating unit test cases for a given function. By iter atively adding further prompt patterns and measuring the robustness, correctness, and comprehensiveness of the AI’s output, multiple prompt patterns and their purpose and strength are investigated. We conclude that high robustness, correctness, and comprehensiveness is hard to achieve, and many prompt patterns (single prompt as well as patterns that span over a conversation) are necessary. More generally, quality assurance is a dominant part of PE and closely intertwined with the development part of PE. Thus traditional testing processes and stages do not adequately apply to QA for PE, and we suggest a PE process that covers the development and quality assurance of prompts as alternative.
Conference paper
Evaluating Task-Level Struggle Detection Methods in Intelligent Tutoring Systems for Programming
(Proceedings of DELFI 2024, 2024) Dannath, Jesper; Deriyeva, Alina; Paaßen, Benjamin
Intelligent Tutoring Systems require student modeling in order to make pedagogical decisions, such as individualized feedback or task selection. Typically, student modeling is based on the eventual correctness of tasks. However, for multi-step or iterative learning tasks, like in programming, the intermediate states towards a correct solution also carry crucial information about learner skill. We investigate how to detect learners who struggle on their path towards a correct solution of a task. Prior work addressed struggle detection in programming environments on different granularity levels, but has mostly focused on preventing course dropout. We conducted a pilot study of our programming learning environment and evaluated different approaches for struggle detection at the task level. For the evaluation of measures, we use downstream Item Response Theory competency models. We find that detecting struggle based on large language model text embeddings outperforms chosen baselines with regard to correlation with a programming competency proxy.
Conference paper
Evaluation von LLM- und Intent-basierten Ansätzen zur Umsetzung eines Chatbots für die Unterstützung bei der Studienorganisation
(Proceedings of DELFI 2024, 2024) Cordes, Andre
Conversational User Interfaces wie Chatbots bieten großes Potential, Studierende ergänzend zu bestehenden Beratungsangeboten bei der Studienorganisation zu unterstützen. Insbesondere durch die Fortschritte im Bereich der Large Language Models (LLMs) eröffnen sich neue Herangehensweisen an die Konstruktion solcher Chatbots. Diese sind jedoch mit Chancen und Risiken verbunden, so dass die Wahl eines geeigneten Ansatzes sorgsam abgewogen werden muss. In diesem Beitrag werden drei Ansätze zur Erstellung solcher Chatbots untersucht und miteinander verglichen: ChatGPT mit Retrieval Augmented Generation (RAG), das Open-Source LLM Mistral mit RAG und ein Intent-basierter Chatbot. Die Ansätze werden hinsichtlich Qualität der Antworten und Risiken (z.B. Halluzinationen) verglichen. Insgesamt zeigt sich, dass alle Ansätze potenziell Anwendung für die Unterstützung bei der Studienorganisation finden können. Aus den gewonnenen Erkenntnissen lässt sich jedoch keine klare Empfehlung für einen Ansatz ableiten, weshalb in weiteren Arbeiten ein hybrider Chatbot untersucht werden sollte.
Konferenzbeitrag
Expanding Knowledge Graphs Through Text: Leveraging Large Language Models for Inductive Link Prediction
(INFORMATIK 2024, 2024) Hamann, Felix; Falk, Maurice; Walker, Lukas
Knowledge graphs (KG) play a crucial role for knowledge modeling in various domains such as web search, medical applications, or technical support, yet they are often incomplete. To mitigate this problem, knowledge graph completion (KGC) may be used to infer missing links of the graph. Taking it a step further, in an automated knowledge acquisition process, links for entirely new, unseen entities may be incorporated. This process is known as inductive link prediction (I-LP). Optionally, text as an external source of information is leveraged to infer the correct linkage of such entities. Depending on the context, this text either provides a comprehensive singular description of the entity or includes numerous incidental references to it. This paper presents a study that explores the application of LLAMA3 as a representative of the current generation of large language models (LLM) to I-LP. Through experimentation on popular benchmark datasets such as Wikidata5m, FB15k-237, WN18-RR, and IRT2, we evaluate the performance of LLMs for inserting new facts into a knowledge base, given textual references to the target object. These benchmarks, by design, exhibit significant variations in the quality of the associated text, as well as in the number of entities and links included. This paper explores several prompt formulations and studies whether pre-emptive retrieval of text helps. For automated link prediction, we implement the full cycle of prompt generation, answer processing, entity candidate lookup, and finally link prediction. Our results show that LLM-based inductive link prediction is outperformed by previously suggested models which fine-tune task-specific LM encoders.
Conference paper
Immersive Räume zur Kreativitätsunterstützung: Ein intelligenter Lehr- und Lernraum
(Proceedings of DELFI 2024, 2024) Fuchs, Andreas; Appel, Sven; Grimm, Paul
Dieser Beitrag präsentiert einen neuartigen Ansatz zur Gestaltung immersiver Räume für die Hochschullehre, die basierend auf Verhalten, gesprochenem Wort und Stimmung eine Unterstützung für kollaborative Kreativitätsprozesse bieten. Ziel ist es, Lehrenden sowie Lernenden in einer interaktiven Virtual Reality-Umgebung durch KI-analysierte und -generierte Inhalte neue Gedankenanstöße zu geben. Durch die Integration von Natural language processing (NLP) und künstlicher Intelligenz wird die Mensch-Computer-Interaktion verbessert, um eine nahtlose Zusammenarbeit zu fördern. Das intelligente System verarbeitet Nutzerdaten und passt die Umgebung an die individuellen Bedürfnisse der Teilnehmenden an. Dies ermöglicht kollaboratives Arbeiten in einer geteilten und zugleich individualisierten Umgebung. Die Anwendung nutzt generative KI zur Erzeugung von Bildern, die auf der verarbeiteten Sprache bzw. den Gesprächsinhalten basieren und beeinflusst gestaltende Elemente wie Beleuchtung, Farbstimmung und Akustik. Der Beitrag erörtert technische Aspekte und potenzielle Anwendungen in Bildung, Unterhaltung und am Arbeitsplatz. Die Forschungsergebnisse deuten darauf hin, dass dieser Ansatz vielversprechend ist, um Kreativität zu fördern und das Wohlbefinden zu steigern.

Auflistung nach Schlagwort "Large Language Models"

Treffer pro Seite

Sortieroptionen