- ZeitschriftenartikelFooling the masses with performance results: Old classics and some new ideas(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012) Wellein, G.; Hager, G.G. Wellein and G. Hager Department for Computer Science and Erlangen Regional Computing Center Friedrich-Alexander-Universität Erlangen-Nürnberg In 1991, David H. Bailey published his insightful 'Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers.' In that humorous article, Bailey pinpointed typical 'evade and disguise' techniques for presenting mediocre performance results in the best possible light. At that time, the supercomputing landscape was governed by the 'chicken vs. oxen' debate: Could strong vector CPUs survive against the new massively parallel systems? In the past two decades, hybrid, hierarchical systems, multi-core processors, accelerator technology, and the dominating presence of commodity hardware have reshaped the landscape of High Performance Computing. It's also not so much oxen vs. chickens anymore; billions of ants have entered the battlefield. This talk gives an update of the 'Twelve Ways.' Old classics are presented alongside new 'stunts' that reflect today's technological boundary conditions. DISCLAIMER: Although these musings are certainly inspired by experience with many publications and talks in HPC, I wish to point out that (i) no offense is intended, (ii) I am not immune to the inherent temptations myself and (iii) this all still just meant to be fun.
- ZeitschriftenartikelAchieving scalability for job centric monitoring in a distributed infrastructure(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012) Hilbrich, Marcus; Müller-Pfefferkorn, RalphJob centric monitoring allows to observe jobs on remote computing resources. It may offer visualisation of recorded monitoring data and helps to find faulty or misbehaving jobs. If installations like grids or clouds are observed monitoring data of many thousands of jobs have to be handled. The challenge of job centric monitoring infrastructures is to store, search and access data collected in huge installations like grids or clouds. We take this challenge with a distributed layer based architecture which provides a uniform view to all monitoring data. The concept of this infrastructure called SLAte and an analysis of the scalability is provided in this paper.
- ZeitschriftenartikelParallel coding for storage systems — An OpenMP and OpenCL capable framework(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012) Sobe, PeterParallel storage systems distribute data onto several devices. This allows high access bandwidth that is needed for parallel computing systems. It also improves the storage reliability, provided erasure-tolerant coding is applied and the coding is fast enough. In this paper we assume storage systems that apply data distribution and coding in a combined way. We describe, how coding can be done parallel on multicore and GPU systems in order to keep track with the high storage access bandwidth. A framework is introduced that calculates coding equations from parameters and translates them into OpenMP- and OpenCL-based coding modules. These modules do the encoding for data that is written to the storage system, and do the decoding in case of failures of storage devices. We report on the performance of the coding modules and identify factors that influence the coding performance.
- ZeitschriftenartikelA Speed-Up Study for a Parallelized White Light Interferometry Preprocessing Algorithm on a Virtual Embedded Multiprocessor System(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012) Schoenwetter, Dominik; Schneider, Max; Fey, DietmarParallel computing has been a niche for scientific research in academia for decades. However, as common industrial applications become more and more performance demanding and raising the clock frequency of conventional single-core systems is hardly an option due to reaching technological limitations, efficient use of (embedded) multi-core CPUs and many-core platforms has become imperative. 3D surface analysis of objects using the white light interferometry presents one of such challenging applications. The goal in this article is to get an impression which speed-up for an established and parallelized white light interferometry preprocessing algorithm, called Contrast Method, is possible on an embedded system that works without any operating system. Therefore, we decided to use a virtual environment that is able to simulate embedded multi-core as well as many-core systems and that enables running real application code on the designed system. The results show, that a significant speed-up is possible when using a many-core platform, instead of a design that only implements one single core, if the algorithm is parallelized for getting full advantage of the manycore design. Furthermore, an acceptable absolute run time is achievable.
- Zeitschriftenartikel1. Aktuelle und zukünftige Aktivitäten (Bericht des Sprechers)(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012)
- ZeitschriftenartikelPreliminary Call For Papers(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012)25. PARS - Workshop
- Zeitschriftenartikel10th Workshop on Parallel Systems and Algorithms PASA 2012(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012)
- ZeitschriftenartikelEuro-Par 2013 Aachen(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012)
- ZeitschriftenartikelParallelization Strategies to Speed-Up Computations for Terrain Analysis on Multi-Core Processors(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012) Schiele, Steffen; Blaar, Holger; Thürkow, Detlef; Möller, Markus; Müller-Hanneman, MatthiasEfficient computation of regional land-surface parameters for large-scale digital elevation models becomes more and more important, in particular for webbased applications. This paper studies the possibilities of decreasing computing time for such tasks by parallel processing using multi-threads on multi-core processors. As an example of calculations of regional land-surface parameters we investigate the computation of flow directions and propose a modified D8 algorithm using an extended neighborhood. In this paper, we discuss two parallelization strategies, one based on a spatial decomposition, the other based on a two-phase approach. Three datasets of high resolution digital elevation models with different geomorphological types of landscapes are used in our evaluation. While local surface parameters allow for an almost ideal speed-up, the situation is different for the calculation of non-local parameters due to data dependencies. Nevertheless, still a significant decrease of computation time has been achieved. A task pool-based strategy turns out to be more efficient for calculations on datasets with many data dependencies.
- ZeitschriftenartikelResilient data encoding for fault-prone signal transmission in parallelized signed-digit based arithmetic(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012) Neuhäuser, David; Zehendner, EberhardWhen arithmetic components are parallelized, fault-prone interconnections can tamper results significantly. Constantly progressing technology scaling leads to a steady increase of errors caused by faulty transmission. Resilient data encoding schemes can be used to offset these negative effects. Focusing on parallel signed-digit based arithmetic frequently used in high-speed systems, we propose suitable data encodings that reduce error rates by 25%. Data encoding should be driven by the occurrence probabilities of digits. We develop a methodology to obtain these probabilities, show an example fault-tolerant encoding, and discuss its impact on communicating parallel arithmetic circuits in an example error scenario.