Assessing Large Language Models in the Agricultural Sector: A Comprehensive Analysis Utilizing a Novel Synthetic Benchmark Dataset

Kästing, MarvinHänig, ChristianKlein, MaikeKrupka, DanielWinter, CorneliaGergeleit, MartinMartin, Ludger2024-10-212024-10-212024978-3-88579-746-32944-7682https://dl.gi.de/handle/20.500.12116/45085This paper provides a comprehensive study of Large Language Models (LLMs) for question-answering and information retrieval tasks within the agricultural domain. We introduce the novel benchmark dataset BVL QA Corpus 2024 specifically designed to thoroughly evaluate both commercial and non-commercial LLMs in agricultural contexts. Using LLMs, we generate question-answer pairs from paragraphs extracted from domain-specific agricultural documents. Leveraging this newly developed benchmark dataset, we assess a selection of LLMs using standard metrics. Additionally, we develop a prototype Retrieval-Augmented Generation (RAG) system tailored to the agricultural sector. This system is then compared to baseline evaluations to determine the degree of alignment between actual performance and initial upper limit estimations. Our empirical analysis demonstrates that RAG systems outperform baseline LLMs across all metrics.enLarge Language ModelsRetrieval-Augmented GenerationAgricultural Information RetrievalBenchmark DatasetAssessing Large Language Models in the Agricultural Sector: A Comprehensive Analysis Utilizing a Novel Synthetic Benchmark DatasetText/Conference Paper10.18420/inf2024_1131617-54682944-7682