Training a deep policy gradient-based neural network with asynchronous learners on a simulated robotic problem

Lötzsch, Winfried; Vitay, Julien; Hamker, Fred

Training a deep policy gradient-based neural network with asynchronous learners on a simulated robotic problem

dc.contributor.author	Lötzsch, Winfried
dc.contributor.author	Vitay, Julien
dc.contributor.author	Hamker, Fred
dc.contributor.editor	Eibl, Maximilian
dc.contributor.editor	Gaedke, Martin
dc.date.accessioned	2017-08-28T23:47:39Z
dc.date.available	2017-08-28T23:47:39Z
dc.date.issued	2017
dc.description.abstract	Recent advances in deep reinforcement learning methods have attracted a lot of attention, because of their ability to use raw signals such as video streams as inputs, instead of pre-processed state variables. However, the most popular methods (value-based methods, e.g. deep Q-networks) focus on discrete action spaces (e.g. the left/right buttons), while realistic robotic applications usually require a continuous action space (for example the joint space). Policy gradient methods, such as stochastic policy gradient or deep deterministic policy gradient, propose to overcome this problem by allowing continuous action spaces. Despite their promises, they suffer from long training times as they need huge numbers of interactions to converge. In this paper, we investigate in how far a recent asynchronously parallel actor-critic approach, initially proposed to speed up discrete RL algorithms, could be used for the continuous control of robotic arms. We demonstrate the capabilities of this end-to-end learning algorithm on a simulated 2 degrees-of-freedom robotic arm and discuss its applications to more realistic scenarios.	en
dc.identifier.doi	10.18420/in2017_214
dc.identifier.isbn	978-3-88579-669-5
dc.identifier.pissn	1617-5468
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik, Bonn
dc.relation.ispartof	INFORMATIK 2017
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-275
dc.subject	deep learning
dc.subject	reinforcement learning
dc.subject	robotics
dc.subject	policy gradient methods
dc.subject	actor critic
dc.subject	asynchronous learning
dc.subject	continuous action space
dc.title	Training a deep policy gradient-based neural network with asynchronous learners on a simulated robotic problem	en
gi.citation.endPage	2154
gi.citation.startPage	2143
gi.conference.date	25.-29. September 2017
gi.conference.location	Chemnitz
gi.conference.sessiontitle	Deep Learning in heterogenen Datenbeständen

Dateien

Originalbündel

1 - 1 von 1

Name:: B29-1.pdf
Größe:: 541.64 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P275 - INFORMATIK 2017