Weight Pruning for Deep Neural Networks on GPUs

Neural networks are getting more complex than ever before, leading to resource-demanding training processes that have been the target of optimization. With embedded real-time applications such as traffic identification in self-driving cars relying on neural networks, the inference latency is becoming more important. The size of the model has been identified as an important target of optimization, as smaller networks also require less computations for inference. A way to shrink a network in size is to remove small weights: weight pruning. This technique has been exploited in a number of ways and has shown to be able to significantly lower the number of weights, while maintaining a very close accuracy compared to the original network. However, current pruning techniques require the removal of up to 90% of the weights, requiring high amount of redundancy in the original network, to be able to speedup the inference as sparse data structures induce overhead. We propose a novel technique for the selection of the weights to be pruned. Our technique is specifically designed to take the architecture of GPUs into account. By selecting the weights to be removed in adjacent groups that are aligned to the memory architecture, we are able to fully exploit the memory bandwidth. Our results show that with the same amount of weights removed, our technique is able to speedup a neural network by a factor of 1:57 given a pruning rate of 90% while maintaining the same accuracy when compared to state-of-the-art pruning techniques.

Hartenstein, Thomas; Maier, Daniel; Cosenza, Biagio; Juurlink, Ben (2020): Weight Pruning for Deep Neural Networks on GPUs. PARS-Mitteilungen: Vol. 35, Nr. 1. Berlin: Gesellschaft für Informatik e.V., Fachgruppe PARS. PISSN: 0177-0454. pp. 51-62

Sammlungen

PARS-Mitteilungen 2020

Komplettanzeige

Weight Pruning for Deep Neural Networks on GPUs

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen