Auflistung PARS-Mitteilungen 2020 nach Erscheinungsdatum
1 - 10 von 12
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelInfluence of Discretization of Frequencies and Processor Allocation on Static Scheduling of Parallelizable Tasks with Deadlines(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Litzinger, Sebastian; Keller, JörgModels for energy-efficient static scheduling of parallelizable tasks with deadlines onfrequency-scalable parallel machines comprise moldable vs. malleable tasks and continuous vs. discrete frequency levels. We investigate the tradeoff between scheduling time and energy efficiency when going from continuous to discrete processor allocation and frequency levels. To this end, we present a tool to convert a schedule computed for malleable tasks on machines with continuous frequency scaling (P. Sanders, J. Speck, Euro-Par 2012) into one for moldable tasks on a machine with discrete frequency levels. We compare the energy efficiency of the converted schedule to the energy consumed by a schedule produced by the integrated crown scheduler (N. Melot et al., ACM TACO 2015) for moldable tasks and a machine with discrete frequency levels. Our experiments indicate that the converted Sanders Speck schedules, while computed faster, consume more energy on average than crown schedules. Surprisingly, it is not the step from malleable to moldable tasks that is responsible, but the step from continuous to discrete frequency levels.
- ZeitschriftenartikelComparing MPI Passive Target Synchronization Schemes on a Non-Cache-Coherent Shared-Memory Processor(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Christgau, Steffen; Schnor, BettinaMPI passive target synchronisation offers exclusive and shared locks. These are the building blocks for the implementation of applications with Readers & Writers semantic, like for example distributed hash tables. This paper discusses the implementation of MPI passive target synchronisation on a non-cache-coherent multicore, the Intel Single-Chip Cloud Computer. The considered algorithms differ in their communication style, their data structures, and their semantics. It is shown that shared memory approaches scale very well and deliver good performance, even in absence of cache coherence.
- ZeitschriftenartikelEvaluating the Usability of Asynchronous Runge-Kutta Methods for Solving ODEs(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Greene, Christopher; Hoffmann, MarkusCombining asynchronous methods with scientific computing is a great challenge. In this paper we make the attempt to combine such methods with a ODE solver. Although the results are not on point for giving us a fully usable asynchronous method, this paper shows the direction of the needed development to get such an asynchronous method.
- ZeitschriftenartikelGenerating Optimized FPGA Based MPSoCs to Parallelize Legacy Embedded Software with Customizable Throughput(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Heid, Kris; Hochberger, ChristianExecuting legacy software on newly developed systems can lead to problems regarding the required throughput of the software. Automatic software parallelization can help to achieve a desired exection time even if a single core version would be to slow. In this contribution, we present a toolset that automatically parallelizes a given legacy software and distributes it to multiple soft-cores forming a processing pipeline. As a goal for the parallelization, the user can provide a minimum throughput that has to be achieved. Although this concept is limited to repetitive tasks, it can be well applied to most embedded system applications. The results show that the tool achieves remarkable speedups without any manual intervention or code restructuring for a sprectrum of benchmarks.
- ZeitschriftenartikelEnabling Malleability for Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics using LAIK(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Raoofy, Amir; Yang, Dai; Weidendorfer, Josef; Trinitis, Carsten; Schulz, MartinMalleability, i.e., the ability for an application to release or acquire resources at runtime, has many benefits for current and future HPC systems. Implementing such functionality, however, is already difficult in newly written code and an even more daunting challenge when considering a dynamic and flexible parallel programming model that separates data and execution into twoorthogonal concerns. These properties promise easier malleability as the runtime can partition resources dynamically as needed, as well as easier incremental porting of existing MPI code. In this paper, we explore the malleability of LAIK with the help of laik-lulesh, a LAIK-based port of LULESH, a proxy application from the CORAL benchmark suite. We show the steps required for porting the application to LAIK, and we present detailed scaling experiments that show promising results.
- ZeitschriftenartikelGPU-beschleunigte Time Warping-Distanzen(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Bachmann, Jörg P.; Trogant, Kevin M.; Freytag, Johann-C.Immer mehr Algorithmen konnten durch Implementierung auf GPUs um mehrere Größenordnungen beschleunigt werden. Insbesondere existieren hochparallele Implementierungen des im Bereich der Zeitreihenanalyse weit verbreiteten Algorithmus’ Dynamic Time Warping (DTW). Dieser Algorithmus berechnet einen Ähnlichkeitswert zweier Zeitreihen (z. B. Temperaturverläufe) unter Berücksichtigung zeitlicher Variationen wie z. B. zeitliche Verschiebungen. Leider können die existierenden GPU-Implementierungen von DTW nicht beliebige zeitliche Variationen berücksichtigen. In dieser Arbeit stellen wir Implementierungen für GPUs vor, die dieser Einschränkung nicht unterliegen. In unserer Evaluierung zeigen wir, dass sie einen Geschwindigkeitsvorteil von ca. zwei Größenordnungen gegenüber einer CPU-Implementierung erreichen.
- ZeitschriftenheftPARS-Mitteilungen 2020(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020)
- ZeitschriftenartikelReducing DRAM Accesses through Pseudo-Channel Mode(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Salehiminapour, Farzaneh; Lucas, Jan; Goebel, Matthias; Juurlink, BenApplications once exclusive to high-performance computing are now common in systems ranging from mobile devices to clusters. They typically require large amounts of memory bandwidth. The graphic DRAM interface standards GDDR5X and GDDR6 are new DRAM technologies that promise to almost doubled data rates compared to GDDR5. However, these higher data rates require a longer burst length of 16 words. This would typically increase the memory access granularity. However, GDDR5X and GDDR6 support a feature called pseudo-channel mode. In pseudo-channel mode, the memory is split into two 16-bit pseudo channels. This split keeps the memory access granularity constant compared to GDDR5. However, the pseudo channels are not fully independent channels. Two accesses can be performed at the same time but access type, bank, and page must match, while column address can be selected separately for each pseudo channel. With this restriction, we arguethat GDDR5X can best be seen as a GDDR5 memory that allows performing an additional request to the same page without extra cost. Therefore, we propose a DRAM buffer scheduling algorithm to make effective use of the pseudo-channel mode and the additional memory bandwidth offered by GDDR5X. Compared to the GDDR5X regular mode, our proposed algorithm achieves 12.5% to 18% memory access reduction on average in pseudo-channel mode.
- ZeitschriftenartikelSymptom-based Fault Detection in Modern Computer Systems(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Becker, Thomas; Rudolf, Nico; Yang, Dai; Karl, WolfgangMiniaturization and the increasing number of components, which get steadily more complex, lead to a rising failure rate in modern computer systems. Especially soft hardware errors are a major problem because they are usually temporary and therefore hard to detect. As classical fault-tolerance methods are very costly and reduce system efficiency, light-weight methods are needed to increase system reliability. A method that copes with this requirement is symptom-based fault detection. In this work, we evaluate the ability to detect different faults with symptom-based fault detection by using hardware performance counters. As the knowledge of a fault occurrence is usually not enough, we also evaluate the possibility to make conclusions about which fault occurred. For the evaluation, we used the fault-injection library FINJ and manually manipulated loops. The results show that symptom-based fault detection enables the system to detect faulty application behavior, however fine-grained conclusions about the causing fault are hardly possible.
- ZeitschriftenartikelThe Evolution of Secure Hash Algorithms(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Pfautsch,Frederik; Schubert, Nils; Orglmeister, Conrad; Gebhart, Maximilian; Habermann, Philipp; Juurlink, BenHashing algorithms are a popular tool for saving passwords securely or file verification. Storing plain-text passwords is problematic if the database gets exposed. However it is also a problem if the used hashing algorithm is outdated. Short passwords can be attacked with brute-force search, hence recommendations of a minimal password length are common. Given that computer performance increased significantly during the last decades, outdated hashes, especially generated by short passwords, are vulnerable today. We evaluate the resilience of SHA-1 and SHA-3 hashing against brute-force attacks on a 24-core dual-processor system, as well as on a modern UltraScale+ FPGA. Reaching a peak performance of 4:45 Ghashes, we are able to find SHA-1 hashed passwords with a length of up to six characters within three minutes. This time increases by a factor of 5.5 for the more secure SHA-3 algorithm due to its higher complexity. We furthermore present a study how the average cracking times grows with increasing password length. To be resilient against brute force attacks, we therefore recommend a minimum password size of at least 8 characters, which increases the needed computing time to several days (SHA-1) or weeks (SHA-3) on average.