Auflistung nach Autor:in "Juurlink, Ben"
1 - 8 von 8
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelAufbau einer FPGA Cluster Architektur für heterogenes, verteiltes Rechnen(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 28, No. 1, 2011) Moser, Nico; Gremzow, Carsten; Juurlink, BenNico Moser , Carsten Gremzow , Ben Juurlink Technische Universität Berlin, Deutschland Universität Wuppertal, Deutschland
- ZeitschriftenartikelA Benchmark Suite for Evaluating Parallel Programming Models(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 28, No. 1, 2011) Andersch, Michael; Juurlink, Ben; Chi, ChiThe transition to multi-core processors enforces software developers to explicitly exploit thread-level parallelism to increase performance. The associated programmability problem has led to the introduction of a plethora of parallel programming models that aim at simplifying software development by raising the abstraction level. Since industry has not settled for a single model, however, multiple significantly different approaches exist. This work presents a benchmark suite which can be used to classify and compare such parallel programming models and, therefore, aids in selecting the appropriate programming model for a given task. After a detailed explanation of the suite's design, preliminary results for two programming models, Pthreads and OmpSs/SMPSs, are presented and analyzed, leading to an outline of further extensions of the suite.
- ZeitschriftenartikelThe Evolution of Secure Hash Algorithms(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Pfautsch,Frederik; Schubert, Nils; Orglmeister, Conrad; Gebhart, Maximilian; Habermann, Philipp; Juurlink, BenHashing algorithms are a popular tool for saving passwords securely or file verification. Storing plain-text passwords is problematic if the database gets exposed. However it is also a problem if the used hashing algorithm is outdated. Short passwords can be attacked with brute-force search, hence recommendations of a minimal password length are common. Given that computer performance increased significantly during the last decades, outdated hashes, especially generated by short passwords, are vulnerable today. We evaluate the resilience of SHA-1 and SHA-3 hashing against brute-force attacks on a 24-core dual-processor system, as well as on a modern UltraScale+ FPGA. Reaching a peak performance of 4:45 Ghashes, we are able to find SHA-1 hashed passwords with a length of up to six characters within three minutes. This time increases by a factor of 5.5 for the more secure SHA-3 algorithm due to its higher complexity. We furthermore present a study how the average cracking times grows with increasing password length. To be resilient against brute force attacks, we therefore recommend a minimum password size of at least 8 characters, which increases the needed computing time to several days (SHA-1) or weeks (SHA-3) on average.
- ZeitschriftenartikelProximity Scheme for Instruction Caches in Tiled CMP Architectures(PARS-Mitteilungen: Vol. 32, Nr. 1, 2015) Alawneh, Tareq; Chi, Chi Ching; Elhossini, Ahmed; Juurlink, BenRecent research results show that there is a high degree of code sharing between cores in multi-core architectures. In this paper we propose a proximity scheme for the instruction caches, a scheme in which the shared code blocks among the neighbouring L2 caches in tiled multi-core architectures are exploited to reduce the average cache miss penalty and the on-chip network traffic. We evaluate the proposed proximity scheme for instruction caches using a full-system simulator running an n-core tiled CMP. The experimental results reveal a significant execution time improvement of up to 91.4% for microbenchmarks whose instruction footprint does not fit in the private L2 cache. For real applications from the PARSEC benchmarks suite, the proposed scheme results in speedups of up to 8%.
- ZeitschriftenartikelA Quantitative Analysis of Processor Memory Bandwidth of an FPGA-MPSoC(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Drehmel, Robert; Göbel, Matthias; Juurlink, BenSystem designers have to choose between a variety of different memories available on modern FPGA-MPSoCs. Our intention is to shed light on the achievable bandwidth when accessing them under diverse circumstances and to hint at their suitability for general-purpose applications. We conducted a systematic quantitative analysis of the memory bandwidth of two processing units using a sophisticated standalone bandwidth measurement tool. The results show a maximum cacheable memory bandwidth of 7.11 GiB/s for reads and 11.78 GiB/s for writes for the general-purpose processing unit, and 2.56 GiB/s for reads and 1.83 GiB/s writes for the special-purpose (real-time) processing unit. In contrast, the achieved non-cacheable read bandwidth lies between 60 MiB/s and 207 MiB/s, with an outlier of 2.67 GiB/s. We conclude that for most applications, relying on DRAM and hardware cache coherency management is the best choice in terms of benefit-cost ratio.
- ZeitschriftenartikelReal-Time Vision System for License Plate Detection and Recognition on FPGA(PARS-Mitteilungen: Vol. 32, Nr. 1, 2015) Rosli, Faird; Elhossini, Ahmed; Juurlink, BenRapid development of the Field Programmable Gate Array (FPGA) offers an alternative way to provide acceleration for computationally intensive tasks such as digital signal and image processing. Its ability to perform parallel processing shows the potential in implementing a high speed vision system. Out of numerous applications of computer vision, this paper focuses on the hardware implementation of one that is commercially known as Automatic Number Plate Recognition (ANPR).Morphological operations and Optical Character Recognition (OCR) algorithms have been implemented on a Xilinx Zynq-7000 All-Programmable SoC to realize the functions of an ANPR system. Test results have shown that the designed and implemented processing pipeline that consumed 63 % of the logic resources is capable of delivering the results with relatively low error rate. Most importantly, the computation time satisfies the real-time requirement for many ANPR applications.
- ZeitschriftenartikelReducing DRAM Accesses through Pseudo-Channel Mode(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Salehiminapour, Farzaneh; Lucas, Jan; Goebel, Matthias; Juurlink, BenApplications once exclusive to high-performance computing are now common in systems ranging from mobile devices to clusters. They typically require large amounts of memory bandwidth. The graphic DRAM interface standards GDDR5X and GDDR6 are new DRAM technologies that promise to almost doubled data rates compared to GDDR5. However, these higher data rates require a longer burst length of 16 words. This would typically increase the memory access granularity. However, GDDR5X and GDDR6 support a feature called pseudo-channel mode. In pseudo-channel mode, the memory is split into two 16-bit pseudo channels. This split keeps the memory access granularity constant compared to GDDR5. However, the pseudo channels are not fully independent channels. Two accesses can be performed at the same time but access type, bank, and page must match, while column address can be selected separately for each pseudo channel. With this restriction, we arguethat GDDR5X can best be seen as a GDDR5 memory that allows performing an additional request to the same page without extra cost. Therefore, we propose a DRAM buffer scheduling algorithm to make effective use of the pseudo-channel mode and the additional memory bandwidth offered by GDDR5X. Compared to the GDDR5X regular mode, our proposed algorithm achieves 12.5% to 18% memory access reduction on average in pseudo-channel mode.
- ZeitschriftenartikelWeight Pruning for Deep Neural Networks on GPUs(PARS-Mitteilungen: Vol. 35, Nr. 1, 2020) Hartenstein, Thomas; Maier, Daniel; Cosenza, Biagio; Juurlink, BenNeural networks are getting more complex than ever before, leading to resource-demanding training processes that have been the target of optimization. With embedded real-time applications such as traffic identification in self-driving cars relying on neural networks, the inference latency is becoming more important. The size of the model has been identified as an important target of optimization, as smaller networks also require less computations for inference. A way to shrink a network in size is to remove small weights: weight pruning. This technique has been exploited in a number of ways and has shown to be able to significantly lower the number of weights, while maintaining a very close accuracy compared to the original network. However, current pruning techniques require the removal of up to 90% of the weights, requiring high amount of redundancy in the original network, to be able to speedup the inference as sparse data structures induce overhead. We propose a novel technique for the selection of the weights to be pruned. Our technique is specifically designed to take the architecture of GPUs into account. By selecting the weights to be removed in adjacent groups that are aligned to the memory architecture, we are able to fully exploit the memory bandwidth. Our results show that with the same amount of weights removed, our technique is able to speedup a neural network by a factor of 1:57 given a pruning rate of 90% while maintaining the same accuracy when compared to state-of-the-art pruning techniques.