Auflistung nach Autor:in "Christgau, Steffen"
1 - 5 von 5
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelComparing MPI Passive Target Synchronization Schemes on a Non-Cache-Coherent Shared-Memory Processor(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Christgau, Steffen; Schnor, BettinaMPI passive target synchronisation offers exclusive and shared locks. These are the building blocks for the implementation of applications with Readers & Writers semantic, like for example distributed hash tables. This paper discusses the implementation of MPI passive target synchronisation on a non-cache-coherent multicore, the Intel Single-Chip Cloud Computer. The considered algorithms differ in their communication style, their data structures, and their semantics. It is shown that shared memory approaches scale very well and deliver good performance, even in absence of cache coherence.
- ZeitschriftenartikelA comparison of CUDA and OpenACC: Accelerating the Tsunami Simulation EasyWave(PARS-Mitteilungen: Vol. 31, Nr. 1, 2014) Christgau, Steffen; Spazier, Johannes; Schnor, Bettina; Hammitzsch, Martin; Babeyko, Andrey; Wächter, JoachimThis paper presents an GPU accelerated version of the tsunami simulation EasyWave. Using two different GPU generations (Nvidia Tesla and Fermi) different optimization techniques were applied to the application following the principle of locality. Their performance impact was analyzed for both hardware generations. The Fermi GPU not only has more cores, but also possesses a L2 cache shared by all streaming multiprocessors. It is revealed that even the most tuned code on the Tesla does not reach the performance of the unoptimized code on the Fermi GPU. Further, a comparison between CUDA and OpenACC shows that the platform independent approach does not reach the speed of the native CUDA code. A deeper analysis shows that memory access patterns have a critical impact on the compute kernels’ performance, although this seems to be caused by the compiler in use.
- ZeitschriftenartikelDesign of MPI Passive Target Synchronization for a Non-Cache-Coherent Many-Core Processor(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Christgau, Steffen; Schnor, BettinaDistributed hash tables are a common approach for fast data access. For this kind of application, a synchronization scheme with Readers and Writers semantic is well suited. This paper presents the design of an implementation of MPI passive target synchronization with Readers and Writers semantic. The implementation is discussed for the Single-Chip Cloud Computer, a non-cachecoherent many-core CPU with shared memory.
- ZeitschriftenartikelGaining Cross-Platform Parallelism for HAL’s Molecular Dynamics Package using SYCL(PARS-Mitteilungen: Vol. 36, 2024) Skoblin, Viktor; Höfling, Felix; Christgau, SteffenMolecular dynamics simulations are one of the methods in scientific computing that benefitfrom GPU acceleration. For those devices, SYCL is a promising API for writing portable codes. In this paper, we present the case study of HAL’s MD package that has been successfully migrated from CUDA to SYCL. We describe the different strategies that we followed in the process of porting the code. Following these strategies, we achieved code portability across major GPU vendors. Depending on the actual kernels, both significant performance improvements and regressions are observed. As a side effect of the migration process, we obtained impressing speedups also for execution on CPUs.
- ZeitschriftenartikelSynchronization of MPI One-Sided Communication on a Non-Cache-Coherent Many-Core System(PARS-Mitteilungen: Vol. 33, Nr. 1, 2016) Christgau, Steffen; Schnor, BettinaThis paper discusses the design and implementation of MPI’s general active target synchronization on the Intel Single-Chip Cloud Computer, a non-cache-coherent many-core CPU. Measurements show a performance benefit of a factor of four compared to the default SCC-tuned MPI implementation and demonstrate the feasibility of implementing efficiently a shared memory protocol despite the lack of cache coherence. Further, a classification of implementation designs of MPI’s general active target synchronization is presented.