LLMs on the Edge: Quality, Latency, and Energy Efficiency
dc.contributor.author | Bast, Sebastian | |
dc.contributor.author | Begic Fazlic, Lejla | |
dc.contributor.author | Naumann, Stefan | |
dc.contributor.author | Dartmann, Guido | |
dc.contributor.editor | Klein, Maike | |
dc.contributor.editor | Krupka, Daniel | |
dc.contributor.editor | Winter, Cornelia | |
dc.contributor.editor | Gergeleit, Martin | |
dc.contributor.editor | Martin, Ludger | |
dc.date.accessioned | 2024-10-21T18:24:12Z | |
dc.date.available | 2024-10-21T18:24:12Z | |
dc.date.issued | 2024 | |
dc.description.abstract | Generative Artificial Intelligence has become integral to many people's lives, with Large Language Models (LLMs) gaining popularity in both science and society. While training these models is known to require significant energy, inference also contributes substantially to their total energy consumption. This study investigates how to use LLMs sustainably by examining the efficiency of inference, particularly on local hardware with limited computing resources. We develop metrics to quantify the efficiency of LLMs on edge devices, focusing on quality, latency, and energy consumption. Our comparison of three state-of-the-art generative models on edge devices shows that they achieve quality scores ranging from 73.3% to 85.9%, generate 1.83 to 3.51 tokens per second, and consume between 0.93 and 1.76 mWh of energy per token on a single-board computer without GPU support. The findings suggest that generative models can produce satisfactory outcomes on edge devices, but thorough efficiency evaluations are recommended before deployment in production environments. | en |
dc.identifier.doi | 10.18420/inf2024_104 | |
dc.identifier.isbn | 978-3-88579-746-3 | |
dc.identifier.pissn | 1617-5468 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/45075 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik e.V. | |
dc.relation.ispartof | INFORMATIK 2024 | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-352 | |
dc.subject | Large Language Models | |
dc.subject | Generative Artificial Intelligence | |
dc.subject | Edge Devices | |
dc.subject | Efficiency | |
dc.title | LLMs on the Edge: Quality, Latency, and Energy Efficiency | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 1192 | |
gi.citation.publisherPlace | Bonn | |
gi.citation.startPage | 1183 | |
gi.conference.date | 24.-26. September 2024 | |
gi.conference.location | Wiesbaden | |
gi.conference.sessiontitle | 5. Workshop "KI in der Umweltinformatik" (KIU-2024) |
Dateien
Originalbündel
1 - 1 von 1
Lade...
- Name:
- Bast_et_al_LLMs_on_the_Edge.pdf
- Größe:
- 825.38 KB
- Format:
- Adobe Portable Document Format