Auflistung nach Autor:in "Hofmann, Johannes"
1 - 4 von 4
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelEmbedded Parallel Computing Accelorators for Smart Control Units of Frequency Converters(PARS-Mitteilungen: Vol. 33, Nr. 1, 2016) Vaas, Steffen; Reichenbach, Marc; Hofmann, Johannes; Stadelmayer, Thomas; Fey, DietmarClassical frequency converters are designed as embedded devices optimized for a specific application-field. But in times of Industry 4.0 simple frequency converters change to smart control units and become more intelligent with analysis and reporting functions to build up smart grids in automation systems for reducing maintenance costs and increasing productivity. To realize these new functions, an evaluation is needed, which kind of computer architectures should be used for these new devices. Due to more complex algorithms, classical microcontrollers are not sufficient anymore. Therefore, we show in this paper, if and how microprocessors in smart control units can benefit from highly parallel hardware accelerators. Consequently, we propose to increase the performance of an ARM Cortex-A9 processor by using an Epiphany III E16 many-core processor as hardware accelerator for complex analysis tasks. Our results show, that a speedup of 1.78 can be achieved, while the power consumption is increased by only 9%.
- ZeitschriftenartikelFast Evolutionary Algorithms: Comparing High Performance Capabilities of CPUs and GPUs(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Hofmann, Johannes; Fey, DietmarWe use Evolutionary Algorithms (EAs) to evaluate different aspects of high performance computing on CPUs and GPUs. EAs have the distinct property of being made up of parts that behave rather differently from each otherand display different requirements for the underlying hardware as well as software. We can use these mo- tives to answer crucial questions for each platform: How do we make best use of the hardware using manual optimization? Which platform offers the better software libraries to perform standard operations such as sorting? Which platform has the higher net floating-point performance and bandwidth? We draw the conclusion that GPUs are able to outperform CPUs in all categories; thusconsidering time-to-solutionEAs should be run on GPUs whenever possible.
- ZeitschriftenartikelFast Evolutionary Algorithms: Comparing High Performance Capabilities of CPUs and GPUs(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Hofmann, Johannes; Fey, DietmarWe use Evolutionary Algorithms (EAs) to evaluate different aspects of high performance computing on CPUs and GPUs. EAs have the distinct property of being made up of parts that behave rather differently from each other, and display different requirements for the underlying hardware as well as software. We can use these motives to answer crucial questions for each platform: How do we make best use of the hardware using manual optimization? Which platform offers the better software libraries to perform standard operations such as sorting? Which platform has the higher net floating-point performance and bandwidth? We draw the conclusion that GPUs are able to outperform CPUs in all categories; thus, considering time-to-solution, EAs should be run on GPUs whenever possible.
- ZeitschriftenartikelPerformance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator(PARS-Mitteilungen: Vol. 31, Nr. 1, 2014) Hofmann, Johannes; Treibig, Jan; Hager, Georg; Wellein, GerhardWe examine the Xeon Phi, which is based on Intel’s Many Integrated Cores architecture, for its suitability to run the FDK algorithm—the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on the Xeon Phi; we perform various micro-benchmarks to investigate the platform’s new set of vector instructions and put a special emphasis on the newly introduced vector gather capability. We refine a previous performance model for the application and adapt it for the Xeon Phi to validate the performance of our optimized hand-written assembly implementation, as well as the performance of several different auto-vectorization approaches.