Logo des Repositoriums
 
Zeitschriftenartikel

Weight Pruning for Deep Neural Networks on GPUs

Vorschaubild nicht verfügbar

Volltext URI

Dokumententyp

Text/Journal Article

Zusatzinformation

Datum

2020

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik e.V., Fachgruppe PARS

Zusammenfassung

Neural networks are getting more complex than ever before, leading to resource-demanding training processes that have been the target of optimization. With embedded real-time applications such as traffic identification in self-driving cars relying on neural networks, the inference latency is becoming more important. The size of the model has been identified as an important target of optimization, as smaller networks also require less computations for inference. A way to shrink a network in size is to remove small weights: weight pruning. This technique has been exploited in a number of ways and has shown to be able to significantly lower the number of weights, while maintaining a very close accuracy compared to the original network. However, current pruning techniques require the removal of up to 90% of the weights, requiring high amount of redundancy in the original network, to be able to speedup the inference as sparse data structures induce overhead. We propose a novel technique for the selection of the weights to be pruned. Our technique is specifically designed to take the architecture of GPUs into account. By selecting the weights to be removed in adjacent groups that are aligned to the memory architecture, we are able to fully exploit the memory bandwidth. Our results show that with the same amount of weights removed, our technique is able to speedup a neural network by a factor of 1:57 given a pruning rate of 90% while maintaining the same accuracy when compared to state-of-the-art pruning techniques.

Beschreibung

Hartenstein, Thomas; Maier, Daniel; Cosenza, Biagio; Juurlink, Ben (2020): Weight Pruning for Deep Neural Networks on GPUs. PARS-Mitteilungen: Vol. 35, Nr. 1. Berlin: Gesellschaft für Informatik e.V., Fachgruppe PARS. PISSN: 0177-0454. pp. 51-62

Schlagwörter

Zitierform

DOI

Tags