Logo des Repositoriums
 

LLMs on the Edge: Quality, Latency, and Energy Efficiency

dc.contributor.authorBast, Sebastian
dc.contributor.authorBegic Fazlic, Lejla
dc.contributor.authorNaumann, Stefan
dc.contributor.authorDartmann, Guido
dc.contributor.editorKlein, Maike
dc.contributor.editorKrupka, Daniel
dc.contributor.editorWinter, Cornelia
dc.contributor.editorGergeleit, Martin
dc.contributor.editorMartin, Ludger
dc.date.accessioned2024-10-21T18:24:12Z
dc.date.available2024-10-21T18:24:12Z
dc.date.issued2024
dc.description.abstractGenerative Artificial Intelligence has become integral to many people's lives, with Large Language Models (LLMs) gaining popularity in both science and society. While training these models is known to require significant energy, inference also contributes substantially to their total energy consumption. This study investigates how to use LLMs sustainably by examining the efficiency of inference, particularly on local hardware with limited computing resources. We develop metrics to quantify the efficiency of LLMs on edge devices, focusing on quality, latency, and energy consumption. Our comparison of three state-of-the-art generative models on edge devices shows that they achieve quality scores ranging from 73.3% to 85.9%, generate 1.83 to 3.51 tokens per second, and consume between 0.93 and 1.76 mWh of energy per token on a single-board computer without GPU support. The findings suggest that generative models can produce satisfactory outcomes on edge devices, but thorough efficiency evaluations are recommended before deployment in production environments.en
dc.identifier.doi10.18420/inf2024_104
dc.identifier.eissn2944-7682
dc.identifier.isbn978-3-88579-746-3
dc.identifier.issn2944-7682
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/45075
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofINFORMATIK 2024
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-352
dc.subjectLarge Language Models
dc.subjectGenerative Artificial Intelligence
dc.subjectEdge Devices
dc.subjectEfficiency
dc.titleLLMs on the Edge: Quality, Latency, and Energy Efficiencyen
dc.typeText/Conference Paper
gi.citation.endPage1192
gi.citation.publisherPlaceBonn
gi.citation.startPage1183
gi.conference.date24.-26. September 2024
gi.conference.locationWiesbaden
gi.conference.sessiontitle5. Workshop "KI in der Umweltinformatik" (KIU-2024)

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
Bast_et_al_LLMs_on_the_Edge.pdf
Größe:
825.38 KB
Format:
Adobe Portable Document Format