PARS-Mitteilungen 2013

https://dl.gi.de/handle/20.500.12116/1912

Auflistung nach:

1 - 10 von 21

Zeitschriftenartikel
Acceleration of Optical Flow Computations on Tightly-Coupled Processor Arrays
(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Sousa, Éricles Rodrigues; Tanase, Alexandru; Lari, Vahid; Hannig, Frank; Teich, Jürgen; Paul, Johny; Stechele, Walter; Kröhnert, Manfred; Asfour, Tamin
Optical flow is widely used in many applications of portable mobile de- vices and automotive embedded systems for the determination of motion of objects in a visual scene. Also in roboticsit is used for motion detection, object segmentation, time-to-contact information, focus of expansion calculations, robot navigation, and automatic parking for vehicles. Similar to many other image processing algorithms, optical flow processes pixel operations repeatedly over whole image frames. Thusit provides a high degree of fine-grained parallelism which can be efficiently exploited on massively parallel processor arrays. In this contextwe propose to accelerate the computation of complex motion estimation vectors on programmable tightly-coupled processor arrays, which offer a high flexibility enabled by coarse-grained reconfiguration capabilities. Novel is also that the degree of parallelism may be adapted to the number of processors that are available to the application. Finallywe present an implementation that is 18 times faster when compared to (a) an FPGA-based soft processor implementationand (b) may be adapted regarding different QoS requirements, hence, being more flexible than a dedicated hardware implementation.
Zeitschriftenartikel
Acceleration of Optical Flow Computations on Tightly-Coupled Processor Arrays
(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Sousa, Éricles; Tanase, Alexandru; Lari, Vahid; Hannig, Frank; Teich, Jürgen; Paul, Johny; Stechele, Walter; Kröhnert, Manfred; Asfour, Tamin
Optical flow is widely used in many applications of portable mobile devices and automotive embedded systems for the determination of motion of objects in a visual scene. Also in robotics, it is used for motion detection, object segmentation, time-to-contact information, focus of expansion calculations, robot navigation, and automatic parking for vehicles. Similar to many other image processing algorithms, optical flow processes pixel operations repeatedly over whole image frames. Thus, it provides a high degree of fine-grained parallelism which can be efficiently exploited on massively parallel processor arrays. In this context, we propose to accelerate the computation of complex motion estimation vectors on programmable tightly-coupled processor arrays, which offer a high flexibility enabled by coarse-grained reconfiguration capabilities. Novel is also that the degree of parallelism may be adapted to the number of processors that are available to the application. Finally, we present an implementation that is 18 times faster when compared to (a) an FPGA-based soft processor implementation, and (b) may be adapted regarding different QoS requirements, hence, being more flexible than a dedicated hardware implementation.
Zeitschriftenartikel
Comparison of PGAS Languages on a Linked Cell Algorithm
(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Bauer, Martin; Kuschel, Christian; Ritter, Daniel; Sembritzki, Klaus
The intention of partitioned global address space (PGAS) languages is to decrease developing time of parallel programs by abstracting the view on the memory and communication. Despite the abstraction a decent speed-up is promised. In this paper the performance and implementation time of Co-Array Fortran (CAF)Unified Parallel C (UPC) and Cascade High Productivity Language (Chapel) are compared by means of a linked cell algorithm. An MPI parallel reference implementation in C is ported to CAFChapel and UPCrespectivelyand is optimized with respect to the available features of the corresponding language. Our tests show parallel programs are developed faster with the above mentioned PGAS languages as compared to MPI. We experienced a performance penalty for the PGAS versions that can be reduced at the expense of a similar programming effort as for MPI. Programmers should be aware that the utilization of PGAS languages may lead to higher administrative effort for compiling and executing programs on different super-computers.
Zeitschriftenartikel
Comparison of PGAS Languages on a Linked Cell Algorithm
(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Bauer, Martin; Kuschel, Christian; Ritter, Daniel; Sembritzki, Klaus
The intention of partitioned global address space (PGAS) languages is to decrease developing time of parallel programs by abstracting the view on the memory and communication. Despite the abstraction a decent speed-up is promised. In this paper the performance and implementation time of Co-Array Fortran (CAF), Unified Parallel C (UPC) and Cascade High Productivity Language (Chapel) are compared by means of a linked cell algorithm. An MPI parallel reference implementation in C is ported to CAF, Chapel and UPC, respectively, and is optimized with respect to the available features of the corresponding language. Our tests show parallel programs are developed faster with the above mentioned PGAS languages as compared to MPI. We experienced a performance penalty for the PGAS versions that can be reduced at the expense of a similar programming effort as for MPI. Programmers should be aware that the utilization of PGAS languages may lead to higher administrative effort for compiling and executing programs on different super-computers.
Zeitschriftenartikel
Dynamic Low-Latency Distributed Event Processing of Sensor Data Streams
(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Mutschler, Christopher; Philippsen, Michael
Event-based systems (EBS) are used to detect meaningful events with low latency in surveillance sports finances etc. Howeverwith rising data and event rates and with correlations among these events processing can no longer be sequential but it needs to be distributed. However naively distributing existing approaches not only cause failures as their order-less processing of events cannot deal with the ubiquity of out-of-order event arrival. It is also hard to achieve a minimal detection latency. This paper illustrates the combination of our building blocks towards a scalable pub- lish/subscribe-based EBS that analyzes high data rate sensor streams with low latency: a parameter calibration to put out-of-order events in order without a-priori knowledge on event delaysa runtime migration of event detectors across system resourcesand an online optimization algorithm that uses migration to improve performance. We evaluate our EBS and its building blocks on position data streams from a Realtime Locating System in a sports application.
Zeitschriftenartikel
Dynamic Low-Latency Distributed Event Processing of Sensor Data Streams
(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Mutschler, Christopher; Philippsen, Michael
Event-based systems (EBS) are used to detect meaningful events with low latency in surveillance, sports, finances, etc. However, with rising data and event rates and with correlations among these events, processing can no longer be sequential but it needs to be distributed. However, naively distributing existing approaches not only cause failures as their order-less processing of events cannot deal with the ubiquity of out-of-order event arrival. It is also hard to achieve a minimal detection latency. This paper illustrates the combination of our building blocks towards a scalable publish/subscribe-based EBS that analyzes high data rate sensor streams with low latency: a parameter calibration to put out-of-order events in order without a-priori knowledge on event delays, a runtime migration of event detectors across system resources, and an online optimization algorithm that uses migration to improve performance. We evaluate our EBS and its building blocks on position data streams from a Realtime Locating System in a sports application.
Zeitschriftenartikel
Energy-Efficient Static Scheduling of Streaming Task Collections with Malleable Tasks
(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Kessler, Christoph; Eitschberger, Patrick; Keller, Jörg
We investigate the energy-efficiency of streaming task collections with parallelizable or malleable tasks on a manycore processor with frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently so that cores typically run several tasks that are scheduled round-robin on user level. A stream of data flows through the tasks and intermediate results are forwarded to other tasks like in a pipelined task graph. We first show the equivalence of task mapping for streaming task collections and normal task collections in the case of continuous frequency scalingunder reasonable assumptions for the user-level schedulerif a makespani.e. a throughput requirement of the streaming applicationis given and the energy consumed is to be minimized. We then show that in the case of discrete frequency scalingit might be necessary for processors to switch frequenciesand that idle times still can occurin contrast to continuous frequency scaling. We formulate the mapping of (streaming) task collections on a manycore processor with discrete frequency levels as an integer linear program. Finallywe propose two heuristics to reduce energy consumption compared to the previous results by improved load bal- ancing through the parallel execution of a parallelizable task. We evaluate the effects of the heuristics analytically and experimentally on the Intel SCC.
Zeitschriftenartikel
Energy-Efficient Static Scheduling of Streaming Task Collections with Malleable Tasks
(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Kessler, Christoph; Eitschberger, Patrick; Keller, Jörg
We investigate the energy-efficiency of streaming task collections with parallelizable or malleable tasks on a manycore processor with frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin on user level. A stream of data flows through the tasks and intermediate results are forwarded to other tasks like in a pipelined task graph. We first show the equivalence of task mapping for streaming task collections and normal task collections in the case of continuous frequency scaling, under reasonable assumptions for the user-level scheduler, if a makespan, i.e. a throughput requirement of the streaming application, is given and the energy consumed is to be minimized. We then show that in the case of discrete frequency scaling, it might be necessary for processors to switch frequencies, and that idle times still can occur, in contrast to continuous frequency scaling. We formulate the mapping of (streaming) task collections on a manycore processor with discrete frequency levels as an integer linear program. Finally, we propose two heuristics to reduce energy consumption compared to the previous results by improved load balancing through the parallel execution of a parallelizable task. We evaluate the effects of the heuristics analytically and experimentally on the Intel SCC.
Zeitschriftenartikel
Fast Evolutionary Algorithms: Comparing High Performance Capabilities of CPUs and GPUs
(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Hofmann, Johannes; Fey, Dietmar
We use Evolutionary Algorithms (EAs) to evaluate different aspects of high performance computing on CPUs and GPUs. EAs have the distinct property of being made up of parts that behave rather differently from each otherand display different requirements for the underlying hardware as well as software. We can use these mo- tives to answer crucial questions for each platform: How do we make best use of the hardware using manual optimization? Which platform offers the better software libraries to perform standard operations such as sorting? Which platform has the higher net floating-point performance and bandwidth? We draw the conclusion that GPUs are able to outperform CPUs in all categories; thusconsidering time-to-solutionEAs should be run on GPUs whenever possible.
Zeitschriftenartikel
Fast Evolutionary Algorithms: Comparing High Performance Capabilities of CPUs and GPUs
(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Hofmann, Johannes; Fey, Dietmar
We use Evolutionary Algorithms (EAs) to evaluate different aspects of high performance computing on CPUs and GPUs. EAs have the distinct property of being made up of parts that behave rather differently from each other, and display different requirements for the underlying hardware as well as software. We can use these motives to answer crucial questions for each platform: How do we make best use of the hardware using manual optimization? Which platform offers the better software libraries to perform standard operations such as sorting? Which platform has the higher net floating-point performance and bandwidth? We draw the conclusion that GPUs are able to outperform CPUs in all categories; thus, considering time-to-solution, EAs should be run on GPUs whenever possible.

Auflistung PARS-Mitteilungen 2013 nach Titel

Treffer pro Seite

Sortieroptionen