Auflistung nach Autor:in "Bromberger, Michael"
1 - 5 von 5
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelAn Architecture Framework for Porting Applications to FPGAs(PARS-Mitteilungen: Vol. 31, Nr. 1, 2014) Nowak, Fabian; Bromberger, Michael; Karl, WolfgangHigh-level language converters help creating FPGAbased accelerators and allow to rapidly come up with a working prototype. But the generated state machines do often not perform as optimal as hand-designed control units, and they require much area. Also, the created deep pipelines are not very efficient for small amounts of data. Our approach is an architecture framework of hand-coded building blocks (BBs). A microprogrammable control unit allows programming the BBs to perform computations in a data-flow style. We accelerate applications further by executing independent tasks in parallel on different BBs. Our microprogram implementation for the Conjugate-Gradient method on our data-driven, microprogrammable, task-parallel architecture framework on the Convey HC-1 is competitive with a 24-thread Intel Westmere system. It is 1.2× faster using only one out of four available FPGAs, thereby proving its potential for accelerating numerical applications. Moreover, we show that hardware developers can change the BBs and thereby reduce iteration count of a numerical algorithm like the ConjugateGradient method to less than 0.5× due to more precise operations inside the BBs, speeding up execution time 2.47×.
- ZeitschriftenartikelDesign Space Exploration Including Approximate Computing for OpenCL-based Stereo Vision Hardware(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Bromberger, Michael; Ehrle, Steffen; Scharrer, Michael; Erlinghagen Lukas; Schick, JensCalculating distances from objects to a subject, for instance a car, is a central task in many applications. Such distances can be calculated by stereo vision exploiting stereo camera images. The high complexity of this approach, which has to be performed under high-performance and lowpower constraints, limits a wide usage. Hardware acceleration is a promising solution to meet above constraints. Two main approaches exist, local ones work on a pixel-wise scheme and global ones consider all pixels at the same time, which highly increases the memory and time complexity. Several optimization methods exist to find Pareto-optimal designs in the design space spanned by accuracy, performance, and resource consumption. Besides well-known techniques, we design, implement, and evaluate new methods, which includes the current research trend of approximate computing. Therefore, in this paper we evaluate different optimization techniques on an OpenCL level for local as well as semi-global approaches. While we target on resource reduction for local approaches, we tackle the memory issue of semi-global approaches. We implement all methods on a low-power and low-cost FPGA-based system on chip and evaluate them on available benchmarks as well as on a real-world scenario. The novel semi-global approximate computing design provides a high frame rate, supports a high number of disparities, and achieves a good accuracy on typical traffic scenes.
- ZeitschriftenartikelEvaluating the Influence of Data Type Precision On Numerical Algorithms(PARS-Mitteilungen: Vol. 34, Nr. 1, 2017) Bromberger, Michael; Hoffmann, Markus; Hampp, Andreas HamppIEEE 32 or 64 bit floating-point arithmetic is often sufficient for different kind of algorithms including scientific applications. However, there is a growing body of applications which have significant computational errors during the calculation leading to incorrect results. Such applications are ranging from numerical algorithms and probabilistic timing analysis to long-time simulations. While designing numerically stable algorithms or interval arithmetic pose possible solutions for certain problems, most scientific programmers are not aware of such deep numerical analyses. In addition, not all issues can be solved using above methods. High precision arithmetic, which is provided by software libraries or coprocessor designs, is a promising solution to overcome above numerical issues. Therefore, we investigate the influence of data type precision on a numerical algorithm, i.e. Lanczos algorithm, and compare different high precision arithmetic software libraries regarding accuracy and execution time. Additionally, we examine the usage of an exact scalar product for the Lanczos algorithm. While we show that high precision arithmetic is crucial for numerical algorithms, such arithmetic is still by far slower than hardware-supported data types.
- ZeitschriftenartikelParallel Prefiltering for Accelerating HHblits on the Convey HC-1(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Bromberger, Michael; Nowak, FabianHHblits is a bioinformatics application for finding proteins with common ances- tors. To achieve more sensitivitythe protein sequences of the query are not compared directly against the database protein sequencesbut rather their Hidden Markov Models are compared. ThusHHblits is very time-consuming and therefore needs to be accelerated. A multi-FPGA system such as the Convey HC-1 is a promising candiate to achieve acceleration. We present the design and implementation of a parallel coprocessor on the Convey HC-1 to accelerate HHblits after analyzing the application toward acceleration candidates. We achieve a speedup of 117.5× against a sequential implementation for FPGA-suitable data sizes per kernel and negligible speedup for the entire uniprot20 protein database against an optimized SSE implementation.
- ZeitschriftenartikelParallel Prefiltering for Accelerating HHblits on the Convey HC-1(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Bromberger, Michael; Nowak, FabianHHblits is a bioinformatics application for finding proteins with common ancestors. To achieve more sensitivity, the protein sequences of the query are not compared directly against the database protein sequences, but rather their Hidden Markov Models are compared. Thus, HHblits is very time-consuming and therefore needs to be accelerated. A multi-FPGA system such as the Convey HC-1 is a promising candidate to achieve acceleration. We present the design and implementation of a parallel coprocessor on the Convey HC-1 to accelerate HHblits after analyzing the application toward acceleration candidates. We achieve a speedup of 117.5× against a sequential implementation for FPGA-suitable data sizes per kernel and negligible speedup for the entire uniprot20 protein database against an optimized SSE implementation.