On Data Spaces for Retrieval Augmented Generation

Hermsen, FelixNitz, LasseAkbari Gurabi, MehdiMatzutt, RomanMandal, AvikarshaKlein, MaikeKrupka, DanielWinter, CorneliaGergeleit, MartinMartin, Ludger2024-10-212024-10-212024978-3-88579-746-32944-7682https://dl.gi.de/handle/20.500.12116/45218Large Language Models (LLMs) have revolutionized knowledge retrieval from natural language queries. However, LLMs still face challenges regarding the creation of domain-specific and accurate answers. Recently, Retrieval Augmented Generation (RAG) architecture has been proposed as one approach to addressing these challenges. While current research focuses on optimizing document retrieval and augmenting the initial query accordingly, we identify untapped potentials of RAG to retrieve knowledge from heterogeneous data sources via data spaces. In this work, we investigate three conceptual integration scenarios between RAG and data spaces. Our findings indicate that given the data space extended RAG, it could provide domain-specific information retrieval with diverse data sources. However, solutions to mitigate unintended information leakage require further consideration.enData SpacesLarge Language ModelsRetrieval Augmented GenerationData SharingOn Data Spaces for Retrieval Augmented GenerationText/Conference Paper10.18420/inf2024_571617-54682944-7682