Auflistung PARS-Mitteilungen 2013 nach Erscheinungsdatum
1 - 10 von 21
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelDynamic Low-Latency Distributed Event Processing of Sensor Data Streams(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Mutschler, Christopher; Philippsen, MichaelEvent-based systems (EBS) are used to detect meaningful events with low latency in surveillance, sports, finances, etc. However, with rising data and event rates and with correlations among these events, processing can no longer be sequential but it needs to be distributed. However, naively distributing existing approaches not only cause failures as their order-less processing of events cannot deal with the ubiquity of out-of-order event arrival. It is also hard to achieve a minimal detection latency. This paper illustrates the combination of our building blocks towards a scalable publish/subscribe-based EBS that analyzes high data rate sensor streams with low latency: a parameter calibration to put out-of-order events in order without a-priori knowledge on event delays, a runtime migration of event detectors across system resources, and an online optimization algorithm that uses migration to improve performance. We evaluate our EBS and its building blocks on position data streams from a Realtime Locating System in a sports application.
- ZeitschriftenartikelReduktion von False-Sharing in Software-Transactional-Memory(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Kempf, Stefan; Veldema, Ronald; Philippsen, MichaelSoftware-Transactional-Memory (STM) erleichtert das parallele Programmieren jedoch hat STM noch einen zu hohen Laufzeitaufwandda gegenseitiger Ausschluss beim Zugriff auf gemeinsame Daten meist mittels einer Lock-Tabelle fester Größe realisiert wird. Für Programme mit wenigen konkurrierenden Zugriffen und überwiegend Lesezugriffen ist diese Tabelle größer als notwendig, so dass beim Commit einer Transaktion mehr Locks zur Konsistenzprüfung zu inspizieren sind als nötig. Für große Datenmengen ist die Tabelle zu klein. Dann begrenzt False-Sharing (unterschiedliche Adressen werden auf das gleiche Lock abgebildet) die Parallelität, da sogar unabhängige Transaktionen sich gegenseitig ausschließen. Diese Arbeit beschreibt eine Technikdie die Lock-Tabelle bei False-Sharing vergrößert. Zusätzlich kann ein Programmierer mit Annotationen unterschiedliche Lock-Tabellen für voneinander unabhängige Daten verlangen, was die Möglichkeit von False-Sharing und den Speicherbedarf für die Locks weiter verringert In Benchmarks erreichen wir einen maximalen Speedup von 10.3 gegenüber TL2wobei die Lock-Tabelle bis zu 1024 mal kleiner ist.
- ZeitschriftenartikelReal-time Simulation of Human Vision using Temporal Compositing with CUDA on the GPU(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Nießner, Matthias; Kuhnert, Nadine; Selgrad, Kai; Stamminger, Marc; Michelson, GeorgWe present a novel approach that simulates human vision including visual defects such as glaucoma by temporal composition of human vision in real-time on the GPU. Therefore, we determine eye focus points every time step and adapt the lens accommodation of our virtual eye model accordingly. The focal distance is then used to determine bluriness of observed scene regions; i.e., we compute defocus for all visible pixels. In order to simulate the visual memory we introduce a sharpness field where we integrate defocus values temporally. That allows for memorizing sharply perceived scene points. For visualization, we ray trace the virtual scene environment while incorporating depth of field based on the sharpness field data. Thus, our algorithm facilitates the simulation of human vision mimicing the visual memory. We consider this to be particularly useful for illustration purposes for patients with visual defects such as glaucoma. In order to run our algorithm in real-time we employ massively parallel graphics hardware.
- ZeitschriftenartikelA highly configurable and efficient simulator for job schedulers on supercomputers(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Karbach, CarstenThe supercomputers maintained by the Jülich Supercomputing Centre (JSC) are run in batch mode and are shared among all active users. Jobs are submitted to the job schedulerwhich determines for each job the time of execution. It manages run- ning and waiting jobssorts them by a given priority and executes the jobs by fitting them into the current system state. For users it is often difficult to comprehend the job scheduler’s algorithm. Thusa prediction of the future system allocation of cur- rently running and queued jobs is valuable. It helps users to plan job submissions and supports administrators by optimising the system load. This paper summarises the design and implementation of a configurable simulation for various job schedulers. The developed simulation program called JuFo focuses on the job schedulers Loadleveler and Moab as these are the most important batch sys- tems for the supercomputers provided by the JSC. A major design goal is to keep the simulation independent from special job schedulerswhich is achieved by the generic configuration of JuFo. Finallya test framework is developedwhich allows for evalu- ating the accuracy of the generated schedules.
- ZeitschriftenartikelAcceleration of Optical Flow Computations on Tightly-Coupled Processor Arrays(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Sousa, Éricles Rodrigues; Tanase, Alexandru; Lari, Vahid; Hannig, Frank; Teich, Jürgen; Paul, Johny; Stechele, Walter; Kröhnert, Manfred; Asfour, TaminOptical flow is widely used in many applications of portable mobile de- vices and automotive embedded systems for the determination of motion of objects in a visual scene. Also in roboticsit is used for motion detection, object segmentation, time-to-contact information, focus of expansion calculations, robot navigation, and automatic parking for vehicles. Similar to many other image processing algorithms, optical flow processes pixel operations repeatedly over whole image frames. Thusit provides a high degree of fine-grained parallelism which can be efficiently exploited on massively parallel processor arrays. In this contextwe propose to accelerate the computation of complex motion estimation vectors on programmable tightly-coupled processor arrays, which offer a high flexibility enabled by coarse-grained reconfiguration capabilities. Novel is also that the degree of parallelism may be adapted to the number of processors that are available to the application. Finallywe present an implementation that is 18 times faster when compared to (a) an FPGA-based soft processor implementationand (b) may be adapted regarding different QoS requirements, hence, being more flexible than a dedicated hardware implementation.
- ZeitschriftenartikelEnergy-Efficient Static Scheduling of Streaming Task Collections with Malleable Tasks(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Kessler, Christoph; Eitschberger, Patrick; Keller, JörgWe investigate the energy-efficiency of streaming task collections with parallelizable or malleable tasks on a manycore processor with frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin on user level. A stream of data flows through the tasks and intermediate results are forwarded to other tasks like in a pipelined task graph. We first show the equivalence of task mapping for streaming task collections and normal task collections in the case of continuous frequency scaling, under reasonable assumptions for the user-level scheduler, if a makespan, i.e. a throughput requirement of the streaming application, is given and the energy consumed is to be minimized. We then show that in the case of discrete frequency scaling, it might be necessary for processors to switch frequencies, and that idle times still can occur, in contrast to continuous frequency scaling. We formulate the mapping of (streaming) task collections on a manycore processor with discrete frequency levels as an integer linear program. Finally, we propose two heuristics to reduce energy consumption compared to the previous results by improved load balancing through the parallel execution of a parallelizable task. We evaluate the effects of the heuristics analytically and experimentally on the Intel SCC.
- ZeitschriftenartikelSelf-organizing Core Allocation(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Ziermann, Tobias; Wildermann, Stefan; Teich, JürgenThis paper deals with the problem of dynamic allocation of cores to parallel applications on homogeneous many-core systems such as, for example, MultiProcessor System-on-Chips (MPSoCs). For a given number of thread-parallel applications, the goal is to find a core assignment that maximizes the average speedup. However, the difficulty is that some applications may have a higher speedup variation than others when assigned additional cores. This paper first presents a centralized algorithm to calculate an optimal assignment for the above objective. However, as the number of cores and the dynamics of applications will significantly increase in the future, decentralized concepts are necessary to scale with this development. Therefore, a decentralized (self-organizing) algorithm is developed in order to minimize the amount of global information that has to be exchanged between applications. The experimental results show that this approach can reach the optimal result of the centralized version in average by 98.95%.
- ZeitschriftenartikelFast Evolutionary Algorithms: Comparing High Performance Capabilities of CPUs and GPUs(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Hofmann, Johannes; Fey, DietmarWe use Evolutionary Algorithms (EAs) to evaluate different aspects of high performance computing on CPUs and GPUs. EAs have the distinct property of being made up of parts that behave rather differently from each otherand display different requirements for the underlying hardware as well as software. We can use these mo- tives to answer crucial questions for each platform: How do we make best use of the hardware using manual optimization? Which platform offers the better software libraries to perform standard operations such as sorting? Which platform has the higher net floating-point performance and bandwidth? We draw the conclusion that GPUs are able to outperform CPUs in all categories; thusconsidering time-to-solutionEAs should be run on GPUs whenever possible.
- ZeitschriftenartikelParallel Prefiltering for Accelerating HHblits on the Convey HC-1(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Bromberger, Michael; Nowak, FabianHHblits is a bioinformatics application for finding proteins with common ances- tors. To achieve more sensitivitythe protein sequences of the query are not compared directly against the database protein sequencesbut rather their Hidden Markov Models are compared. ThusHHblits is very time-consuming and therefore needs to be accelerated. A multi-FPGA system such as the Convey HC-1 is a promising candiate to achieve acceleration. We present the design and implementation of a parallel coprocessor on the Convey HC-1 to accelerate HHblits after analyzing the application toward acceleration candidates. We achieve a speedup of 117.5× against a sequential implementation for FPGA-suitable data sizes per kernel and negligible speedup for the entire uniprot20 protein database against an optimized SSE implementation.
- ZeitschriftenartikelFast Evolutionary Algorithms: Comparing High Performance Capabilities of CPUs and GPUs(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Hofmann, Johannes; Fey, DietmarWe use Evolutionary Algorithms (EAs) to evaluate different aspects of high performance computing on CPUs and GPUs. EAs have the distinct property of being made up of parts that behave rather differently from each other, and display different requirements for the underlying hardware as well as software. We can use these motives to answer crucial questions for each platform: How do we make best use of the hardware using manual optimization? Which platform offers the better software libraries to perform standard operations such as sorting? Which platform has the higher net floating-point performance and bandwidth? We draw the conclusion that GPUs are able to outperform CPUs in all categories; thus, considering time-to-solution, EAs should be run on GPUs whenever possible.
- «
- 1 (current)
- 2
- 3
- »