Auflistung PARS-Mitteilungen 2017 nach Titel
1 - 10 von 13
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelCache-Partitionierung im Kontext von Co-Scheduling(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Weidendorfer, Josef; Trinitis, Carsten; Rückerl, Sebastian; Klemm, MichaelNeuere Mehrkern-Architekturen, die allen Rechenkernen einen gemeinsam nutzbaren Cache zur Verfügung stellen, besitzen die Fähigkeit, diesen Cache dynamisch zwischen den Kernen aufzuteilen. Die Partitionierbarkeit ist dafür gedacht, sogenannten Cloud-Anbietern zu erlauben, einzelne Kerne an Kunden zu vermieten, ohne dass deren Rechenlasten sich gegenseitig beeinflussen oder Seitenkanäle zum Abgreifen von Daten entstehen. Cache-Partitionierung lässt sich aber auch gewinnbringend dafür einsetzen, mehrere Anwendungen aus dem Hochleistungsrechnen so auf Mehrkern-Architekturen ablaufen zu lassen, dass sie sich gegenseitig auf der gemeinsam nutzbaren Cache-Ebene nicht stören und dynamisch eine jeweils passende Cache-Größe zur eigenen Verfügung haben. In diesem Beitrag werden erste Ergebnisse zur Cache-Partitionierung mittels Cache Allocation Technology CAT) und deren Auswirkungen auf Co-Scheduling-Strategien im Hochleistungsrechnen vorgestellt.
- ZeitschriftenartikelDesign of MPI Passive Target Synchronization for a Non-Cache-Coherent Many-Core Processor(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Christgau, Steffen; Schnor, BettinaDistributed hash tables are a common approach for fast data access. For this kind of application, a synchronization scheme with Readers and Writers semantic is well suited. This paper presents the design of an implementation of MPI passive target synchronization with Readers and Writers semantic. The implementation is discussed for the Single-Chip Cloud Computer, a non-cachecoherent many-core CPU with shared memory.
- ZeitschriftenartikelDesign Space Exploration Including Approximate Computing for OpenCL-based Stereo Vision Hardware(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Bromberger, Michael; Ehrle, Steffen; Scharrer, Michael; Erlinghagen Lukas; Schick, JensCalculating distances from objects to a subject, for instance a car, is a central task in many applications. Such distances can be calculated by stereo vision exploiting stereo camera images. The high complexity of this approach, which has to be performed under high-performance and lowpower constraints, limits a wide usage. Hardware acceleration is a promising solution to meet above constraints. Two main approaches exist, local ones work on a pixel-wise scheme and global ones consider all pixels at the same time, which highly increases the memory and time complexity. Several optimization methods exist to find Pareto-optimal designs in the design space spanned by accuracy, performance, and resource consumption. Besides well-known techniques, we design, implement, and evaluate new methods, which includes the current research trend of approximate computing. Therefore, in this paper we evaluate different optimization techniques on an OpenCL level for local as well as semi-global approaches. While we target on resource reduction for local approaches, we tackle the memory issue of semi-global approaches. We implement all methods on a low-power and low-cost FPGA-based system on chip and evaluate them on available benchmarks as well as on a real-world scenario. The novel semi-global approximate computing design provides a high frame rate, supports a high number of disparities, and achieves a good accuracy on typical traffic scenes.
- ZeitschriftenartikelDevelopment and implementation of a temperature monitoring system for HPC systems(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Baumann, Martin; Gebhart, Fabian; Mattes, Oliver; Nikas, Sotirios; Heuveline, VincentIn the context of high-performance computing (HPC), the removal of released heat is one challenging topic due to the continuously increasing density of computing power. A temperature monitoring system provides insight into the heat development of an HPC cluster. The effectiveness of this is directly related to the number of sensors, their placing and the accuracy of the temperature measurements. Monitoring is important not only to investigate the efficiency of the cooling system for purposes of detecting defective operation of the HPC system, but also to improve the cooling of the servers and by this the achievable performance. The main purpose of a fine-grained and unified temperature monitoring is the possibility to optimize the applications and their execution regarding the temperature spreading on HPC systems. Based on this, we present a highly flexible and scalable – in terms of cable length and number of sensors – and at the same time budget-friendly monitoring infrastructure. It is based on low-cost components such as Raspberry Pi as monitoring client and a setup using the DS18B20 digital thermometer as temperature sensor. Focus is given on the selection of adequate temperature sensors and we explain in detail how the sensors are assembled and the quality assurance is done before these are used in the monitoring setup.
- ZeitschriftenartikelA Distributed Hash Table using One-sided Communication in MPI(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Sobe, Peter; Graupner, Tom; Hennig, FlorianThe Message Passing Interface (MPI) can be applied to implement data structures that are distributed across process memory, such as distributed arrays or hash tables. In this paper a hash table implementation is described that employs one-sided communication in case of collision-free access. Collisions of data entries within the hash table are treated using dynamic overflow memory and two-sided communication. This leads to a two-level communication architecture that combines one-sided and two-sided operations in a data structure and the related access operations. This approach circumvents the problem of dynamic and unforeseen size and arrangement of data structures in shared memory that would be hard to manage using solely one-sided communication.
- ZeitschriftenartikelEfficient Simulation of PRAM Algorithms on Shared Memory Machines(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Berr, NicolasThe parallel random-access machine (PRAM) is an abstract shared memory register machine used in computer science to model the algorithmic performance of parallel algorithms. Although being used as theoretical model for many years, only few attempts have been made to prove technical feasibility of the model for the use in real world applications. One of these attempts was the SB-PRAM Project, which included the development of a real PRAM hardware, a high-level PRAM programming language and a compiler. It offered programmers the ability to implement algorithms designed for a PRAM in a natural way. Today, the hardware based prototype no longer exists, but a simulation software is still available. Even though the simulated hardware contains a huge amount of inherent parallelism, it turned out to be hard to provide an efficient parallel implementation of the simulation. In this article a promising new approach for this problem, its implementation and evaluation is presented. Experiments have shown the high potential of its efficiency and discover even more potential that can be exploited by future work.
- ZeitschriftenartikelEvaluating the Influence of Data Type Precision On Numerical Algorithms(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Bromberger, Michael; Hoffmann, Markus; Hampp, Andreas HamppIEEE 32 or 64 bit floating-point arithmetic is often sufficient for different kind of algorithms including scientific applications. However, there is a growing body of applications which have significant computational errors during the calculation leading to incorrect results. Such applications are ranging from numerical algorithms and probabilistic timing analysis to long-time simulations. While designing numerically stable algorithms or interval arithmetic pose possible solutions for certain problems, most scientific programmers are not aware of such deep numerical analyses. In addition, not all issues can be solved using above methods. High precision arithmetic, which is provided by software libraries or coprocessor designs, is a promising solution to overcome above numerical issues. Therefore, we investigate the influence of data type precision on a numerical algorithm, i.e. Lanczos algorithm, and compare different high precision arithmetic software libraries regarding accuracy and execution time. Additionally, we examine the usage of an exact scalar product for the Lanczos algorithm. While we show that high precision arithmetic is crucial for numerical algorithms, such arithmetic is still by far slower than hardware-supported data types.
- ZeitschriftenartikelAn Image Processing Operator Language for Design and Synthesis of Smart Camera Architectures(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Hartmann, Christian; Häublein, Konrad; Pfundt, Benjamin; Reichenbach, Marc; Fey, DietmarRecent trends showed a rise of heterogeneous hardware architectures for image processing applications. Due to the usage of these camera systems in the embedded field, the reduction of area and power consumption became essential. Standard CPUs are not suitable in the embedded field, because of their lavish commerce regarding power and area consumption. Embedded applications have strict constraints regarding these parameters. Therefore, optimized and specialized hardware is required resulting in a heterogeneous system architecture. Designing such a system is a challenging and error-prone task. In the design process, software and hardware skills are needed. Programming skills in different programming and design languages are necessary. For reducing the complexity a common language which can easily be mapped on different hardware architectures combined with a synthesis framework is needed. With the Image Processing Operator Language (IPOL) the description of heterogeneous systems with one language become possible. The synthesis framework called Image Processing Architecture Synthesis (IPAS) completes the domain-specific language (DSL) as an underlying mapping methodology.
- ZeitschriftenartikelLAIK: A Library for Fault Tolerant Distribution of Global Data for Parallel Applications(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Weidendorfer, Josef; Yang, Dai; Trinitis, CarstenHPC applications usually are not written in a way that they can cope with dynamic changes in the execution environment, such as removing or integrating new nodes or node components. However, for higher flexibility with regard to scheduling and fault tolerance strategies, adequate application-integrated reaction would be worthwhile. However, with legacy MPI codes, this is difficult to achieve. In this paper, we present Lightweight Application-Integrated data distribution for parallel worKers (LAIK), a lightweight library for distributed index spaces and associated data containers for parallel programs supporting fault tolerance features. By giving LAIK control over data and its partitioning, the library can free compute nodes before they fail and do replication for rollback schemes on demand. Applications become more adaptive to changes of available resources. We show a simple example which integrates our LAIK library and present first results on a prototype implementation.
- ZeitschriftenartikelMinimizing Energy Cost in Task-Graph Execution on Parallel Platforms(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Gerhards, Rainer; Keller, JörgWe investigate minimization of energy cost for execution of statically scheduled task graphs on parallel machines with frequency scaling and given deadlines, assuming that the power profile of the processing elements and the energy price curve over time is known or can be predicted. We present both a mixed integer linear program and a heuristic to solve this problem, using time slots of fixed lengths and discrete frequency levels for both approaches and a fixed budget per time slot for the heuristic. We evaluate the heuristic by comparison to cost-optimal schedules. For price curves occurring in practice, and for deadlines not too close to the minimum makespan, the heuristic produces about 15% more energy cost than the optimal solution.