Auflistung PARS-Mitteilungen 2013 nach Erscheinungsdatum
1 - 10 von 21
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelReal-time Simulation of Human Vision using Temporal Compositing with CUDA on the GPU(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Nießner, Matthias; Kuhnert, Nadine; Selgrad, Kai; Stamminger, Marc; Michelson, GeorgWe present a novel approach that simulates human vision including visual defects such as glaucoma by temporal composition of human vision in real-time on the GPU. Thereforewe determine eye focus points every time step and adapt the lens accommodation of our virtual eye model accordingly. The focal distance is then used to determine bluriness of observed scene regions; i.e.we compute defocus for all visible pixels. In order to simulate the visual memory we introduce a sharpness field where we integrate defocus values temporally. That allows for memorizing sharply perceived scene points. For visualizationwe ray trace the virtual scene environment while incorporating depth of field based on the sharpness field data. Thusour al- gorithm facilitates the simulation of human vision mimicing the visual memory. We consider this to be particularly useful for illustration purposes for patients with vi- sual defects such as glaucoma. In order to run our algorithm in real-time we employ massively parallel graphics hardware.
- ZeitschriftenartikelComparison of PGAS Languages on a Linked Cell Algorithm(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Bauer, Martin; Kuschel, Christian; Ritter, Daniel; Sembritzki, KlausThe intention of partitioned global address space (PGAS) languages is to decrease developing time of parallel programs by abstracting the view on the memory and communication. Despite the abstraction a decent speed-up is promised. In this paper the performance and implementation time of Co-Array Fortran (CAF)Unified Parallel C (UPC) and Cascade High Productivity Language (Chapel) are compared by means of a linked cell algorithm. An MPI parallel reference implementation in C is ported to CAFChapel and UPCrespectivelyand is optimized with respect to the available features of the corresponding language. Our tests show parallel programs are developed faster with the above mentioned PGAS languages as compared to MPI. We experienced a performance penalty for the PGAS versions that can be reduced at the expense of a similar programming effort as for MPI. Programmers should be aware that the utilization of PGAS languages may lead to higher administrative effort for compiling and executing programs on different super-computers.
- ZeitschriftenartikelEnergy-Efficient Static Scheduling of Streaming Task Collections with Malleable Tasks(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Kessler, Christoph; Eitschberger, Patrick; Keller, JörgWe investigate the energy-efficiency of streaming task collections with parallelizable or malleable tasks on a manycore processor with frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently so that cores typically run several tasks that are scheduled round-robin on user level. A stream of data flows through the tasks and intermediate results are forwarded to other tasks like in a pipelined task graph. We first show the equivalence of task mapping for streaming task collections and normal task collections in the case of continuous frequency scalingunder reasonable assumptions for the user-level schedulerif a makespani.e. a throughput requirement of the streaming applicationis given and the energy consumed is to be minimized. We then show that in the case of discrete frequency scalingit might be necessary for processors to switch frequenciesand that idle times still can occurin contrast to continuous frequency scaling. We formulate the mapping of (streaming) task collections on a manycore processor with discrete frequency levels as an integer linear program. Finallywe propose two heuristics to reduce energy consumption compared to the previous results by improved load bal- ancing through the parallel execution of a parallelizable task. We evaluate the effects of the heuristics analytically and experimentally on the Intel SCC.
- ZeitschriftenartikelComparison of PGAS Languages on a Linked Cell Algorithm(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Bauer, Martin; Kuschel, Christian; Ritter, Daniel; Sembritzki, KlausThe intention of partitioned global address space (PGAS) languages is to decrease developing time of parallel programs by abstracting the view on the memory and communication. Despite the abstraction a decent speed-up is promised. In this paper the performance and implementation time of Co-Array Fortran (CAF), Unified Parallel C (UPC) and Cascade High Productivity Language (Chapel) are compared by means of a linked cell algorithm. An MPI parallel reference implementation in C is ported to CAF, Chapel and UPC, respectively, and is optimized with respect to the available features of the corresponding language. Our tests show parallel programs are developed faster with the above mentioned PGAS languages as compared to MPI. We experienced a performance penalty for the PGAS versions that can be reduced at the expense of a similar programming effort as for MPI. Programmers should be aware that the utilization of PGAS languages may lead to higher administrative effort for compiling and executing programs on different super-computers.
- ZeitschriftenartikelEnergy-Efficient Static Scheduling of Streaming Task Collections with Malleable Tasks(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Kessler, Christoph; Eitschberger, Patrick; Keller, JörgWe investigate the energy-efficiency of streaming task collections with parallelizable or malleable tasks on a manycore processor with frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin on user level. A stream of data flows through the tasks and intermediate results are forwarded to other tasks like in a pipelined task graph. We first show the equivalence of task mapping for streaming task collections and normal task collections in the case of continuous frequency scaling, under reasonable assumptions for the user-level scheduler, if a makespan, i.e. a throughput requirement of the streaming application, is given and the energy consumed is to be minimized. We then show that in the case of discrete frequency scaling, it might be necessary for processors to switch frequencies, and that idle times still can occur, in contrast to continuous frequency scaling. We formulate the mapping of (streaming) task collections on a manycore processor with discrete frequency levels as an integer linear program. Finally, we propose two heuristics to reduce energy consumption compared to the previous results by improved load balancing through the parallel execution of a parallelizable task. We evaluate the effects of the heuristics analytically and experimentally on the Intel SCC.
- ZeitschriftenartikelSelf-organizing Core Allocation(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Ziermann, Tobias; Wildermann, Stefan; Teich, JürgenThis paper deals with the problem of dynamic allocation of cores to parallel applications on homogeneous many-core systems such as, for example, MultiProcessor System-on-Chips (MPSoCs). For a given number of thread-parallel applications, the goal is to find a core assignment that maximizes the average speedup. However, the difficulty is that some applications may have a higher speedup variation than others when assigned additional cores. This paper first presents a centralized algorithm to calculate an optimal assignment for the above objective. However, as the number of cores and the dynamics of applications will significantly increase in the future, decentralized concepts are necessary to scale with this development. Therefore, a decentralized (self-organizing) algorithm is developed in order to minimize the amount of global information that has to be exchanged between applications. The experimental results show that this approach can reach the optimal result of the centralized version in average by 98.95%.
- ZeitschriftenartikelFast Evolutionary Algorithms: Comparing High Performance Capabilities of CPUs and GPUs(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Hofmann, Johannes; Fey, DietmarWe use Evolutionary Algorithms (EAs) to evaluate different aspects of high performance computing on CPUs and GPUs. EAs have the distinct property of being made up of parts that behave rather differently from each otherand display different requirements for the underlying hardware as well as software. We can use these mo- tives to answer crucial questions for each platform: How do we make best use of the hardware using manual optimization? Which platform offers the better software libraries to perform standard operations such as sorting? Which platform has the higher net floating-point performance and bandwidth? We draw the conclusion that GPUs are able to outperform CPUs in all categories; thusconsidering time-to-solutionEAs should be run on GPUs whenever possible.
- ZeitschriftenartikelParallel Prefiltering for Accelerating HHblits on the Convey HC-1(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Bromberger, Michael; Nowak, FabianHHblits is a bioinformatics application for finding proteins with common ances- tors. To achieve more sensitivitythe protein sequences of the query are not compared directly against the database protein sequencesbut rather their Hidden Markov Models are compared. ThusHHblits is very time-consuming and therefore needs to be accelerated. A multi-FPGA system such as the Convey HC-1 is a promising candiate to achieve acceleration. We present the design and implementation of a parallel coprocessor on the Convey HC-1 to accelerate HHblits after analyzing the application toward acceleration candidates. We achieve a speedup of 117.5× against a sequential implementation for FPGA-suitable data sizes per kernel and negligible speedup for the entire uniprot20 protein database against an optimized SSE implementation.
- ZeitschriftenartikelFast Evolutionary Algorithms: Comparing High Performance Capabilities of CPUs and GPUs(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Hofmann, Johannes; Fey, DietmarWe use Evolutionary Algorithms (EAs) to evaluate different aspects of high performance computing on CPUs and GPUs. EAs have the distinct property of being made up of parts that behave rather differently from each other, and display different requirements for the underlying hardware as well as software. We can use these motives to answer crucial questions for each platform: How do we make best use of the hardware using manual optimization? Which platform offers the better software libraries to perform standard operations such as sorting? Which platform has the higher net floating-point performance and bandwidth? We draw the conclusion that GPUs are able to outperform CPUs in all categories; thus, considering time-to-solution, EAs should be run on GPUs whenever possible.
- ZeitschriftenheftPARS-Mitteilungen 2013(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013)
- «
- 1 (current)
- 2
- 3
- »