Weight Pruning for Deep Neural Networks on GPUs

Hartenstein, Thomas; Maier, Daniel; Cosenza, Biagio; Juurlink, Ben

Weight Pruning for Deep Neural Networks on GPUs

dc.contributor.author	Hartenstein, Thomas
dc.contributor.author	Maier, Daniel
dc.contributor.author	Cosenza, Biagio
dc.contributor.author	Juurlink, Ben
dc.date.accessioned	2020-08-25T09:05:21Z
dc.date.available	2020-08-25T09:05:21Z
dc.date.issued	2020
dc.description.abstract	Neural networks are getting more complex than ever before, leading to resource-demanding training processes that have been the target of optimization. With embedded real-time applications such as traffic identification in self-driving cars relying on neural networks, the inference latency is becoming more important. The size of the model has been identified as an important target of optimization, as smaller networks also require less computations for inference. A way to shrink a network in size is to remove small weights: weight pruning. This technique has been exploited in a number of ways and has shown to be able to significantly lower the number of weights, while maintaining a very close accuracy compared to the original network. However, current pruning techniques require the removal of up to 90% of the weights, requiring high amount of redundancy in the original network, to be able to speedup the inference as sparse data structures induce overhead. We propose a novel technique for the selection of the weights to be pruned. Our technique is specifically designed to take the architecture of GPUs into account. By selecting the weights to be removed in adjacent groups that are aligned to the memory architecture, we are able to fully exploit the memory bandwidth. Our results show that with the same amount of weights removed, our technique is able to speedup a neural network by a factor of 1:57 given a pruning rate of 90% while maintaining the same accuracy when compared to state-of-the-art pruning techniques.	en
dc.identifier.pissn	0177-0454
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/33865
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik e.V., Fachgruppe PARS
dc.relation.ispartof	PARS-Mitteilungen: Vol. 35, Nr. 1
dc.title	Weight Pruning for Deep Neural Networks on GPUs	en
dc.type	Text/Journal Article
gi.citation.endPage	62
gi.citation.publisherPlace	Berlin
gi.citation.startPage	51

Dateien

Originalbündel

1 - 1 von 1

Name:: PARS2019_paper_9.pdf
Größe:: 422.62 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

PARS-Mitteilungen 2020