LLMs on the Edge: Quality, Latency, and Energy Efficiency

Bast, Sebastian; Begic Fazlic, Lejla; Naumann, Stefan; Dartmann, Guido

LLMs on the Edge: Quality, Latency, and Energy Efficiency

dc.contributor.author	Bast, Sebastian
dc.contributor.author	Begic Fazlic, Lejla
dc.contributor.author	Naumann, Stefan
dc.contributor.author	Dartmann, Guido
dc.contributor.editor	Klein, Maike
dc.contributor.editor	Krupka, Daniel
dc.contributor.editor	Winter, Cornelia
dc.contributor.editor	Gergeleit, Martin
dc.contributor.editor	Martin, Ludger
dc.date.accessioned	2024-10-21T18:24:12Z
dc.date.available	2024-10-21T18:24:12Z
dc.date.issued	2024
dc.description.abstract	Generative Artificial Intelligence has become integral to many people's lives, with Large Language Models (LLMs) gaining popularity in both science and society. While training these models is known to require significant energy, inference also contributes substantially to their total energy consumption. This study investigates how to use LLMs sustainably by examining the efficiency of inference, particularly on local hardware with limited computing resources. We develop metrics to quantify the efficiency of LLMs on edge devices, focusing on quality, latency, and energy consumption. Our comparison of three state-of-the-art generative models on edge devices shows that they achieve quality scores ranging from 73.3% to 85.9%, generate 1.83 to 3.51 tokens per second, and consume between 0.93 and 1.76 mWh of energy per token on a single-board computer without GPU support. The findings suggest that generative models can produce satisfactory outcomes on edge devices, but thorough efficiency evaluations are recommended before deployment in production environments.	en
dc.identifier.doi	10.18420/inf2024_104
dc.identifier.eissn	2944-7682
dc.identifier.isbn	978-3-88579-746-3
dc.identifier.issn	2944-7682
dc.identifier.pissn	1617-5468
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/45075
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik e.V.
dc.relation.ispartof	INFORMATIK 2024
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-352
dc.subject	Large Language Models
dc.subject	Generative Artificial Intelligence
dc.subject	Edge Devices
dc.subject	Efficiency
dc.title	LLMs on the Edge: Quality, Latency, and Energy Efficiency	en
dc.type	Text/Conference Paper
gi.citation.endPage	1192
gi.citation.publisherPlace	Bonn
gi.citation.startPage	1183
gi.conference.date	24.-26. September 2024
gi.conference.location	Wiesbaden
gi.conference.sessiontitle	5. Workshop "KI in der Umweltinformatik" (KIU-2024)

Dateien

Originalbündel

1 - 1 von 1

Name:: Bast_et_al_LLMs_on_the_Edge.pdf
Größe:: 825.38 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P352 - INFORMATIK 2024 - Lock in or log out? Wie digitale Souveränität gelingt