- ZeitschriftenheftPARS-Mitteilungen 2016(PARS-Mitteilungen: Vol. 33, Nr. 1, 2016)
- ZeitschriftenartikelOptimizing Parallel Runtime of Cryptanalytic Algorithms by Selecting Between Word-Parallel and Bit-Serial Variants of Program Parts(PARS-Mitteilungen: Vol. 33, Nr. 1, 2016) Eitschberger, Patrick; Keller, JörgCryptanalytic algorithms such as dictionary attacks, that test huge numbers of keys to decrypt a ciphertext to a certain plaintext, need lots of computational resources and efficient coding, but allow large scale parallelism such as many-cores plus GPUs. Some attacks have profited from a bit-serial data representation, that allows SIMD-like coding per thread and increases the degree of parallelism. We investigate the question how to decide for distinct parts of such algorithms whether to code them in a bit-serial or normal word-parallel manner. Given bit-serial and word-parallel variants for each part of the cryptographic algorithm, we benchmark the runtime of the variants, and additionally the runtime of the conversion between the different data rep- resentations. Then we model the resulting variant selection problem as a direct graph — in the fashion of a global composition optimization problem — and find the optimal runtime by computing the shortest path from source to sink node. We evaluate our approach with the Advanced Encryption Standard (AES) and demonstrate runtime advantages.
- ZeitschriftenartikelSimulation based Analysis of Memory Access Conflicts for Heterogeneous Multi-Core Platforms(PARS-Mitteilungen: Vol. 33, Nr. 1, 2016) Brandenburg, Jens; Stabernack, BennoBesides aspects of HW/SW partitioning, resource allocation and mapping, also the optimization of the memory subsystem plays a crucial role during the complex HW/SW co-design and co-optimization process. Especially for memory bound applications, like state of the art video codecs, the memory subsystem has become one of the bottlenecks limiting the performance gains from parallelization and HW accelerated approaches. Memory access conflicts, due to the concurrent access to a shared memory location, are a major source of this bottleneck. To develop counter strategies and to optimize the design, an in-depth analysis of all memory access conflicts is necessary and required. In order to provide this analysis, we propose a flexible tracing and profiling methodology, which provides a timing-accurate memory access conflict analysis for SystemC-based platform simulation models. In a case study this memory access conflict analysis is performed for a heterogeneous platform running a parallel high efficiency video coding (HEVC) intra encoder application. This analysis leads to an optimized design, which reduces the number of memory access conflicts and shows significant performance gains for the target video encoder application.
- ZeitschriftenartikelEmbedded Parallel Computing Accelorators for Smart Control Units of Frequency Converters(PARS-Mitteilungen: Vol. 33, Nr. 1, 2016) Vaas, Steffen; Reichenbach, Marc; Hofmann, Johannes; Stadelmayer, Thomas; Fey, DietmarClassical frequency converters are designed as embedded devices optimized for a specific application-field. But in times of Industry 4.0 simple frequency converters change to smart control units and become more intelligent with analysis and reporting functions to build up smart grids in automation systems for reducing maintenance costs and increasing productivity. To realize these new functions, an evaluation is needed, which kind of computer architectures should be used for these new devices. Due to more complex algorithms, classical microcontrollers are not sufficient anymore. Therefore, we show in this paper, if and how microprocessors in smart control units can benefit from highly parallel hardware accelerators. Consequently, we propose to increase the performance of an ARM Cortex-A9 processor by using an Epiphany III E16 many-core processor as hardware accelerator for complex analysis tasks. Our results show, that a speedup of 1.78 can be achieved, while the power consumption is increased by only 9%.
- ZeitschriftenartikelSynchronization of MPI One-Sided Communication on a Non-Cache-Coherent Many-Core System(PARS-Mitteilungen: Vol. 33, Nr. 1, 2016) Christgau, Steffen; Schnor, BettinaThis paper discusses the design and implementation of MPI’s general active target synchronization on the Intel Single-Chip Cloud Computer, a non-cache-coherent many-core CPU. Measurements show a performance benefit of a factor of four compared to the default SCC-tuned MPI implementation and demonstrate the feasibility of implementing efficiently a shared memory protocol despite the lack of cache coherence. Further, a classification of implementation designs of MPI’s general active target synchronization is presented.