Auflistung PARS-Mitteilungen 2017 nach Erscheinungsdatum
1 - 10 von 13
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelCache-Partitionierung im Kontext von Co-Scheduling(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Weidendorfer, Josef; Trinitis, Carsten; Rückerl, Sebastian; Klemm, MichaelNeuere Mehrkern-Architekturen, die allen Rechenkernen einen gemeinsam nutzbaren Cache zur Verfügung stellen, besitzen die Fähigkeit, diesen Cache dynamisch zwischen den Kernen aufzuteilen. Die Partitionierbarkeit ist dafür gedacht, sogenannten Cloud-Anbietern zu erlauben, einzelne Kerne an Kunden zu vermieten, ohne dass deren Rechenlasten sich gegenseitig beeinflussen oder Seitenkanäle zum Abgreifen von Daten entstehen. Cache-Partitionierung lässt sich aber auch gewinnbringend dafür einsetzen, mehrere Anwendungen aus dem Hochleistungsrechnen so auf Mehrkern-Architekturen ablaufen zu lassen, dass sie sich gegenseitig auf der gemeinsam nutzbaren Cache-Ebene nicht stören und dynamisch eine jeweils passende Cache-Größe zur eigenen Verfügung haben. In diesem Beitrag werden erste Ergebnisse zur Cache-Partitionierung mittels Cache Allocation Technology CAT) und deren Auswirkungen auf Co-Scheduling-Strategien im Hochleistungsrechnen vorgestellt.
- ZeitschriftenartikelPredicting Efficient Execution with Source Code Analysis in a Heterogeneous Environment(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Hellwig, Markus; Becker, ThomasFinding a good schedule for the tasks of an application is a critical step for the efficient usage of heterogeneous systems. A good schedule can only be found with information about the tasks to be scheduled. In a dynamic system, this information is normally only available after each task is at least executed once, thereby creating an initial overhead until a good schedule can be created. Therefore, we introduce a method based on static code analysis and machine learning algorithms to predict the fastest processor of a given OpenCL task before runtime by classification which helps to reduce this initial overhead. We show how we used a static code analysis implementation based on Clang to generate training data on a set of 10 different heterogeneous processors including Intel, AMD and Nvidia GPUs, a Intel Xeon Phi and Intel CPUs. This training data was used to generate prediction models via several different machine learning algorithms including Random Forest and k-Nearest Neighbour and then evaluate the models by predicting the fastest processor out of two and more processors via classification.
- ZeitschriftenheftPARS-Mitteilungen 2017(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017)
- ZeitschriftenartikelEfficient Simulation of PRAM Algorithms on Shared Memory Machines(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Berr, NicolasThe parallel random-access machine (PRAM) is an abstract shared memory register machine used in computer science to model the algorithmic performance of parallel algorithms. Although being used as theoretical model for many years, only few attempts have been made to prove technical feasibility of the model for the use in real world applications. One of these attempts was the SB-PRAM Project, which included the development of a real PRAM hardware, a high-level PRAM programming language and a compiler. It offered programmers the ability to implement algorithms designed for a PRAM in a natural way. Today, the hardware based prototype no longer exists, but a simulation software is still available. Even though the simulated hardware contains a huge amount of inherent parallelism, it turned out to be hard to provide an efficient parallel implementation of the simulation. In this article a promising new approach for this problem, its implementation and evaluation is presented. Experiments have shown the high potential of its efficiency and discover even more potential that can be exploited by future work.
- ZeitschriftenartikelLAIK: A Library for Fault Tolerant Distribution of Global Data for Parallel Applications(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Weidendorfer, Josef; Yang, Dai; Trinitis, CarstenHPC applications usually are not written in a way that they can cope with dynamic changes in the execution environment, such as removing or integrating new nodes or node components. However, for higher flexibility with regard to scheduling and fault tolerance strategies, adequate application-integrated reaction would be worthwhile. However, with legacy MPI codes, this is difficult to achieve. In this paper, we present Lightweight Application-Integrated data distribution for parallel worKers (LAIK), a lightweight library for distributed index spaces and associated data containers for parallel programs supporting fault tolerance features. By giving LAIK control over data and its partitioning, the library can free compute nodes before they fail and do replication for rollback schemes on demand. Applications become more adaptive to changes of available resources. We show a simple example which integrates our LAIK library and present first results on a prototype implementation.
- ZeitschriftenartikelDesign of MPI Passive Target Synchronization for a Non-Cache-Coherent Many-Core Processor(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Christgau, Steffen; Schnor, BettinaDistributed hash tables are a common approach for fast data access. For this kind of application, a synchronization scheme with Readers and Writers semantic is well suited. This paper presents the design of an implementation of MPI passive target synchronization with Readers and Writers semantic. The implementation is discussed for the Single-Chip Cloud Computer, a non-cachecoherent many-core CPU with shared memory.
- ZeitschriftenartikelAn Image Processing Operator Language for Design and Synthesis of Smart Camera Architectures(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Hartmann, Christian; Häublein, Konrad; Pfundt, Benjamin; Reichenbach, Marc; Fey, DietmarRecent trends showed a rise of heterogeneous hardware architectures for image processing applications. Due to the usage of these camera systems in the embedded field, the reduction of area and power consumption became essential. Standard CPUs are not suitable in the embedded field, because of their lavish commerce regarding power and area consumption. Embedded applications have strict constraints regarding these parameters. Therefore, optimized and specialized hardware is required resulting in a heterogeneous system architecture. Designing such a system is a challenging and error-prone task. In the design process, software and hardware skills are needed. Programming skills in different programming and design languages are necessary. For reducing the complexity a common language which can easily be mapped on different hardware architectures combined with a synthesis framework is needed. With the Image Processing Operator Language (IPOL) the description of heterogeneous systems with one language become possible. The synthesis framework called Image Processing Architecture Synthesis (IPAS) completes the domain-specific language (DSL) as an underlying mapping methodology.
- ZeitschriftenartikelDevelopment and implementation of a temperature monitoring system for HPC systems(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Baumann, Martin; Gebhart, Fabian; Mattes, Oliver; Nikas, Sotirios; Heuveline, VincentIn the context of high-performance computing (HPC), the removal of released heat is one challenging topic due to the continuously increasing density of computing power. A temperature monitoring system provides insight into the heat development of an HPC cluster. The effectiveness of this is directly related to the number of sensors, their placing and the accuracy of the temperature measurements. Monitoring is important not only to investigate the efficiency of the cooling system for purposes of detecting defective operation of the HPC system, but also to improve the cooling of the servers and by this the achievable performance. The main purpose of a fine-grained and unified temperature monitoring is the possibility to optimize the applications and their execution regarding the temperature spreading on HPC systems. Based on this, we present a highly flexible and scalable – in terms of cable length and number of sensors – and at the same time budget-friendly monitoring infrastructure. It is based on low-cost components such as Raspberry Pi as monitoring client and a setup using the DS18B20 digital thermometer as temperature sensor. Focus is given on the selection of adequate temperature sensors and we explain in detail how the sensors are assembled and the quality assurance is done before these are used in the monitoring setup.
- ZeitschriftenartikelMinimizing Energy Cost in Task-Graph Execution on Parallel Platforms(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Gerhards, Rainer; Keller, JörgWe investigate minimization of energy cost for execution of statically scheduled task graphs on parallel machines with frequency scaling and given deadlines, assuming that the power profile of the processing elements and the energy price curve over time is known or can be predicted. We present both a mixed integer linear program and a heuristic to solve this problem, using time slots of fixed lengths and discrete frequency levels for both approaches and a fixed budget per time slot for the heuristic. We evaluate the heuristic by comparison to cost-optimal schedules. For price curves occurring in practice, and for deadlines not too close to the minimum makespan, the heuristic produces about 15% more energy cost than the optimal solution.
- ZeitschriftenartikelEvaluating the Influence of Data Type Precision On Numerical Algorithms(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Bromberger, Michael; Hoffmann, Markus; Hampp, Andreas HamppIEEE 32 or 64 bit floating-point arithmetic is often sufficient for different kind of algorithms including scientific applications. However, there is a growing body of applications which have significant computational errors during the calculation leading to incorrect results. Such applications are ranging from numerical algorithms and probabilistic timing analysis to long-time simulations. While designing numerically stable algorithms or interval arithmetic pose possible solutions for certain problems, most scientific programmers are not aware of such deep numerical analyses. In addition, not all issues can be solved using above methods. High precision arithmetic, which is provided by software libraries or coprocessor designs, is a promising solution to overcome above numerical issues. Therefore, we investigate the influence of data type precision on a numerical algorithm, i.e. Lanczos algorithm, and compare different high precision arithmetic software libraries regarding accuracy and execution time. Additionally, we examine the usage of an exact scalar product for the Lanczos algorithm. While we show that high precision arithmetic is crucial for numerical algorithms, such arithmetic is still by far slower than hardware-supported data types.