Logo des Repositoriums
 

Weight Pruning for Deep Neural Networks on GPUs

dc.contributor.authorHartenstein, Thomas
dc.contributor.authorMaier, Daniel
dc.contributor.authorCosenza, Biagio
dc.contributor.authorJuurlink, Ben
dc.date.accessioned2020-08-25T09:05:21Z
dc.date.available2020-08-25T09:05:21Z
dc.date.issued2020
dc.description.abstractNeural networks are getting more complex than ever before, leading to resource-demanding training processes that have been the target of optimization. With embedded real-time applications such as traffic identification in self-driving cars relying on neural networks, the inference latency is becoming more important. The size of the model has been identified as an important target of optimization, as smaller networks also require less computations for inference. A way to shrink a network in size is to remove small weights: weight pruning. This technique has been exploited in a number of ways and has shown to be able to significantly lower the number of weights, while maintaining a very close accuracy compared to the original network. However, current pruning techniques require the removal of up to 90% of the weights, requiring high amount of redundancy in the original network, to be able to speedup the inference as sparse data structures induce overhead. We propose a novel technique for the selection of the weights to be pruned. Our technique is specifically designed to take the architecture of GPUs into account. By selecting the weights to be removed in adjacent groups that are aligned to the memory architecture, we are able to fully exploit the memory bandwidth. Our results show that with the same amount of weights removed, our technique is able to speedup a neural network by a factor of 1:57 given a pruning rate of 90% while maintaining the same accuracy when compared to state-of-the-art pruning techniques.en
dc.identifier.pissn0177-0454
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/33865
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V., Fachgruppe PARS
dc.relation.ispartofPARS-Mitteilungen: Vol. 35, Nr. 1
dc.titleWeight Pruning for Deep Neural Networks on GPUsen
dc.typeText/Journal Article
gi.citation.endPage62
gi.citation.publisherPlaceBerlin
gi.citation.startPage51

Dateien

Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
PARS2019_paper_9.pdf
Größe:
422.62 KB
Format:
Adobe Portable Document Format