Vogel, MarcoOden, Lena2024-09-252024-09-2520240177-0454https://dl.gi.de/handle/20.500.12116/44641This study evaluates an efficient compression algorithm suitable for use with CUDA-aware MPI, aiming to lessen the latency of extensive GPU message transfers. We examine the performance of various compression algorithms on distinct datasets. Ndzip emerges as the optimal compression algorithm for our needs. Our findings reveal that large message latency can improve depending on the dataset. However, latency may increase for non-compressible data due to overhead when using compression. With well-compressible data, the Cannon algorithm for dense matrix-matrix multiplication can improve performance by up to 30%. For data that is not highly compressible, there’s only a minor performance penalty, as the compression overhead remains relatively small.enEvaluation of GPU-Compression Algorithms for CUDA-Aware MPIText/Journal Article