Fuentes Grau, LauraEiling, NiklasLankes, StefanMonti, Antonello2024-09-252024-09-2520240177-0454https://dl.gi.de/handle/20.500.12116/44643Modern HPC Cluster increasingly make use of accelerators, such as GPUs, to achieve the computational throughput that today’s applications require. Distributing computations across heterogeneous computing nodes necessitates a vast amount of inter-device data transfers, not only between, but also within, nodes. Each type of device defines unique APIs to handle these transfers. They differ in their implementation but fulfill the same task: Exchanging data between memory regions. To meet the high requirements of bandwidth and latency, many device interfaces offer asynchronous APIs that enable hardware offloading of data transfers. This paper introduces TosKonnect to unify asynchronous device communication, while keeping configurability and interoperability in mind. TosKonnect is a queue-based communication layer that defines a vendor-neutral and device-independent API for inter-device data transfers while hiding the intricate details of device communication APIs. With the low overhead TosKonnect introduces into device communication, it provides developers with a performant tool to organize data transfers.enTosKonnect: A Modular Queue-based Communication Layer for Heterogeneous High Performance ComputingText/Journal Article