Auflistung nach Schlagwort "Large Language Models"
1 - 10 von 25
Treffer pro Seite
Sortieroptionen
- Student PaperAI-based chatbots as enabler for efficient external knowledge management in public administration(7. Fachtagung Rechts- und Verwaltungsinformatik (RVI 2024): Neue Wege der Zusammenarbeit und Vernetzung für digitale Transformation und Verwaltungsmodernisierung, 2024) Wiethölter, Jost; Kühl, Linus; Feldmann, CarstenThis study addresses the pressing issue of staff shortages in German public administrations through the lens of digitalization, focusing on the potential of AI-based chatbots to solve this problem by replacing human labour. Employing a Design Science Research Process (DSRP) methodology, the research synthesizes theoretical foundations and regulatory frameworks to develop a robust chatbot concept. The artifact presented is a comprehensive architectural framework integrating user-centric design, linguistic processing, and regulatory compliance. The proposed artifact navigates complex federal structures and diverse IT infrastructures, promoting accessibility and inclusivity. Implications suggest enhanced efficiency and accessibility in public service delivery for potentially increasing citizen satisfaction and decreasing employee workload. The study underscores the importance of legal compliance and the evolving regulatory landscape in AI deployment. Future research will involve prototyping and evaluating the artifact's performance and applicability throughout the course of the DSRP, thus contributing to the advancement of digital transformation in public administrations.
- KonferenzbeitragAssessing Large Language Models in the Agricultural Sector: A Comprehensive Analysis Utilizing a Novel Synthetic Benchmark Dataset(INFORMATIK 2024, 2024) Kästing, Marvin; Hänig, ChristianThis paper provides a comprehensive study of Large Language Models (LLMs) for question-answering and information retrieval tasks within the agricultural domain. We introduce the novel benchmark dataset BVL QA Corpus 2024 specifically designed to thoroughly evaluate both commercial and non-commercial LLMs in agricultural contexts. Using LLMs, we generate question-answer pairs from paragraphs extracted from domain-specific agricultural documents. Leveraging this newly developed benchmark dataset, we assess a selection of LLMs using standard metrics. Additionally, we develop a prototype Retrieval-Augmented Generation (RAG) system tailored to the agricultural sector. This system is then compared to baseline evaluations to determine the degree of alignment between actual performance and initial upper limit estimations. Our empirical analysis demonstrates that RAG systems outperform baseline LLMs across all metrics.
- KonferenzbeitragAssessment Power of ChatGPT in the Context of Environmental Compliance Management – Experiments with a Real-World Regulation Cadastre(EnviroInfo 2023, 2023) Thimm, HeikoIn multiple research disciplines use cases built on Large Language Models in particular ChatGPT are at the centre of today’s discussions. For example, in various ongoing projects of the LegalTech area ChatGPT is evaluated in terms of its potential to replace routine work of lawyers. In a recently started project we are investigating the use of ChatGPT for a specific corporate compliance management task. In particular, based on a real-world test data set ChatGPT is prompted to assess the relevance of environmental regulations. The ChatGPT output is compared to the respective judgements of the human experts in order to obtain a first indication of the assessment power of ChatGPT in the compliance management domain. This research in progress article gives an overview of the evaluation approach and presents first results of a set of 142 test cases covering regulations from four different areas of environmental legislation.
- KonferenzbeitragAugmentation through Generative AI: Exploring the Effects of Human-AI Interaction and Explainable AI on Service Performance(Mensch und Computer 2024 - Workshopband, 2024) Reinhard, PhilippGenerative artificial intelligence (GenAI), particularly large language models (LLMs), offer new capabilities of natural language understanding and generation, potentially reducing employee stress and high turnover rates in customer service delivery. However, these systems also present risks, such as generating convincing but erroneous responses, known as hallucinations and confabulations. Thus, this study investigates the impact of GenAI on service performance in customer support settings, emphasizing augmentation over automation to address three key inquiries: identifying patterns of GenAI infusion that alter service routines, assessing the effects of human-AI interaction on cognitive load and task performance, and evaluating the role of explainable AI (XAI) in detecting erroneous responses such as hallucinations. Employing a design science research approach, the study combines literature reviews, expert interviews, and experimental designs to derive implications for designing GenAI-driven augmentation. Preliminary findings reveal three key insights: (1) Service employees play a critical role in retaining organizational knowledge and delegating decisions to GenAI agents; (2) Utilizing GenAI co-pilots significantly reduces the cognitive load during stressful customer interactions; and (3) Novice employees face challenges in discerning accurate AI-generated advice from inaccurate suggestions without additional explanatory context.
- KonferenzbeitragComputer-Assisted Short Answer Grading Using Large Language Models and Rubrics(INFORMATIK 2024, 2024) Metzler, Tim; Plöger, Paul G.; Hees, JörnGrading student answers and providing feedback are essential yet time-consuming tasks for educators. Recent advancements in Large Language Models (LLMs), including ChatGPT, Llama, and Mistral, have paved the way for automated support in this domain. This paper investigates the efficacy of instruction-following LLMs in adhering to predefined rubrics for evaluating student answers and delivering meaningful feedback. Leveraging the Mohler dataset and a custom German dataset, we evaluate various models, from commercial ones like ChatGPT to smaller open-source options like Llama, Mistral, and Command R. Additionally, we explore the impact of temperature parameters and techniques such as few-shot prompting. Surprisingly, while few-shot prompting enhances grading accuracy closer to ground truth, it introduces model inconsistency. Furthermore, some models exhibit non-deterministic behavior even at near-zero temperature settings. Our findings highlight the importance of rubrics in enhancing the interpretability of model outputs and fostering consistency in grading practices.
- KonferenzbeitragEngineering A Reliable Prompt For Generating Unit Tests - Prompt engineering for QA & QA for prompt engineering(Softwaretechnik-Trends Band 43, Heft 3, 2023) Faragó, DavidThis paper demonstrates Prompt Engineering (PE) on a running example: generating unit test cases for a given function. By iter atively adding further prompt patterns and measuring the robustness, correctness, and comprehensiveness of the AI’s output, multiple prompt patterns and their purpose and strength are investigated. We conclude that high robustness, correctness, and comprehensiveness is hard to achieve, and many prompt patterns (single prompt as well as patterns that span over a conversation) are necessary. More generally, quality assurance is a dominant part of PE and closely intertwined with the development part of PE. Thus traditional testing processes and stages do not adequately apply to QA for PE, and we suggest a PE process that covers the development and quality assurance of prompts as alternative.
- Conference paperEvaluating Task-Level Struggle Detection Methods in Intelligent Tutoring Systems for Programming(Proceedings of DELFI 2024, 2024) Dannath, Jesper; Deriyeva, Alina; Paaßen, BenjaminIntelligent Tutoring Systems require student modeling in order to make pedagogical decisions, such as individualized feedback or task selection. Typically, student modeling is based on the eventual correctness of tasks. However, for multi-step or iterative learning tasks, like in programming, the intermediate states towards a correct solution also carry crucial information about learner skill. We investigate how to detect learners who struggle on their path towards a correct solution of a task. Prior work addressed struggle detection in programming environments on different granularity levels, but has mostly focused on preventing course dropout. We conducted a pilot study of our programming learning environment and evaluated different approaches for struggle detection at the task level. For the evaluation of measures, we use downstream Item Response Theory competency models. We find that detecting struggle based on large language model text embeddings outperforms chosen baselines with regard to correlation with a programming competency proxy.
- Conference paperEvaluation von LLM- und Intent-basierten Ansätzen zur Umsetzung eines Chatbots für die Unterstützung bei der Studienorganisation(Proceedings of DELFI 2024, 2024) Cordes, AndreConversational User Interfaces wie Chatbots bieten großes Potential, Studierende ergänzend zu bestehenden Beratungsangeboten bei der Studienorganisation zu unterstützen. Insbesondere durch die Fortschritte im Bereich der Large Language Models (LLMs) eröffnen sich neue Herangehensweisen an die Konstruktion solcher Chatbots. Diese sind jedoch mit Chancen und Risiken verbunden, so dass die Wahl eines geeigneten Ansatzes sorgsam abgewogen werden muss. In diesem Beitrag werden drei Ansätze zur Erstellung solcher Chatbots untersucht und miteinander verglichen: ChatGPT mit Retrieval Augmented Generation (RAG), das Open-Source LLM Mistral mit RAG und ein Intent-basierter Chatbot. Die Ansätze werden hinsichtlich Qualität der Antworten und Risiken (z.B. Halluzinationen) verglichen. Insgesamt zeigt sich, dass alle Ansätze potenziell Anwendung für die Unterstützung bei der Studienorganisation finden können. Aus den gewonnenen Erkenntnissen lässt sich jedoch keine klare Empfehlung für einen Ansatz ableiten, weshalb in weiteren Arbeiten ein hybrider Chatbot untersucht werden sollte.
- KonferenzbeitragExpanding Knowledge Graphs Through Text: Leveraging Large Language Models for Inductive Link Prediction(INFORMATIK 2024, 2024) Hamann, Felix; Falk, Maurice; Walker, LukasKnowledge graphs (KG) play a crucial role for knowledge modeling in various domains such as web search, medical applications, or technical support, yet they are often incomplete. To mitigate this problem, knowledge graph completion (KGC) may be used to infer missing links of the graph. Taking it a step further, in an automated knowledge acquisition process, links for entirely new, unseen entities may be incorporated. This process is known as inductive link prediction (I-LP). Optionally, text as an external source of information is leveraged to infer the correct linkage of such entities. Depending on the context, this text either provides a comprehensive singular description of the entity or includes numerous incidental references to it. This paper presents a study that explores the application of LLAMA3 as a representative of the current generation of large language models (LLM) to I-LP. Through experimentation on popular benchmark datasets such as Wikidata5m, FB15k-237, WN18-RR, and IRT2, we evaluate the performance of LLMs for inserting new facts into a knowledge base, given textual references to the target object. These benchmarks, by design, exhibit significant variations in the quality of the associated text, as well as in the number of entities and links included. This paper explores several prompt formulations and studies whether pre-emptive retrieval of text helps. For automated link prediction, we implement the full cycle of prompt generation, answer processing, entity candidate lookup, and finally link prediction. Our results show that LLM-based inductive link prediction is outperformed by previously suggested models which fine-tune task-specific LM encoders.
- Conference paperImmersive Räume zur Kreativitätsunterstützung: Ein intelligenter Lehr- und Lernraum(Proceedings of DELFI 2024, 2024) Fuchs, Andreas; Appel, Sven; Grimm, PaulDieser Beitrag präsentiert einen neuartigen Ansatz zur Gestaltung immersiver Räume für die Hochschullehre, die basierend auf Verhalten, gesprochenem Wort und Stimmung eine Unterstützung für kollaborative Kreativitätsprozesse bieten. Ziel ist es, Lehrenden sowie Lernenden in einer interaktiven Virtual Reality-Umgebung durch KI-analysierte und -generierte Inhalte neue Gedankenanstöße zu geben. Durch die Integration von Natural language processing (NLP) und künstlicher Intelligenz wird die Mensch-Computer-Interaktion verbessert, um eine nahtlose Zusammenarbeit zu fördern. Das intelligente System verarbeitet Nutzerdaten und passt die Umgebung an die individuellen Bedürfnisse der Teilnehmenden an. Dies ermöglicht kollaboratives Arbeiten in einer geteilten und zugleich individualisierten Umgebung. Die Anwendung nutzt generative KI zur Erzeugung von Bildern, die auf der verarbeiteten Sprache bzw. den Gesprächsinhalten basieren und beeinflusst gestaltende Elemente wie Beleuchtung, Farbstimmung und Akustik. Der Beitrag erörtert technische Aspekte und potenzielle Anwendungen in Bildung, Unterhaltung und am Arbeitsplatz. Die Forschungsergebnisse deuten darauf hin, dass dieser Ansatz vielversprechend ist, um Kreativität zu fördern und das Wohlbefinden zu steigern.
- «
- 1 (current)
- 2
- 3
- »