Timezone: »
Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.
Author Information
Gerhard Neumann (Graz University of Technology)
Jan Peters (TU Darmstadt & DFKI)
Jan Peters is a full professor (W3) for Intelligent Autonomous Systems at the Computer Science Department of the Technische Universitaet Darmstadt and at the same time a senior research scientist and group leader at the Max-Planck Institute for Intelligent Systems, where he heads the interdepartmental Robot Learning Group. Jan Peters has received the Dick Volz Best 2007 US PhD Thesis Runner-Up Award, the Robotics: Science & Systems - Early Career Spotlight, the INNS Young Investigator Award, and the IEEE Robotics & Automation Society‘s Early Career Award as well as numerous best paper awards. In 2015, he was awarded an ERC Starting Grant. Jan Peters has studied Computer Science, Electrical, Mechanical and Control Engineering at TU Munich and FernUni Hagen in Germany, at the National University of Singapore (NUS) and the University of Southern California (USC). He has received four Master‘s degrees in these disciplines as well as a Computer Science PhD from USC.
Related Events (a corresponding poster, oral, or spotlight)
-
2008 Poster: Fitted Q-iteration by Advantage Weighted Regression »
Wed. Dec 10th through Tue the 9th Room
More from the Same Authors
-
2020 : Differentiable Implicit Layers »
Andreas Look · Simona Doneva · Melih Kandemir · Rainer Gemulla · Jan Peters -
2022 : How crucial is Transformer in Decision Transformer? »
Max Siebenborn · Boris Belousov · Junning Huang · Jan Peters -
2022 : Conditioned Score-Based Models for Learning Collision-Free Trajectory Generation »
Joao Carvalho · Mark Baierl · Julen Urain · Jan Peters -
2023 Competition: The Robot Air Hockey Challenge: Robust, Reliable, and Safe Learning Techniques for Real-world Robotics »
Puze Liu · Jonas Günster · Niklas Funk · Dong Chen · Haitham Bou Ammar · Davide Tateo · Jan Peters -
2022 Poster: Information-Theoretic Safe Exploration with Gaussian Processes »
Alessandro Bottero · Carlos Luis · Julia Vinogradska · Felix Berkenkamp · Jan Peters -
2020 Poster: Self-Paced Deep Reinforcement Learning »
Pascal Klink · Carlo D'Eramo · Jan Peters · Joni Pajarinen -
2020 Oral: Self-Paced Deep Reinforcement Learning »
Pascal Klink · Carlo D'Eramo · Jan Peters · Joni Pajarinen -
2017 : Panel Discussion »
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox -
2017 : Hierarchical Imitation and Reinforcement Learning for Robotics (Jan Peters) »
Jan Peters -
2016 Poster: Catching heuristics are optimal control policies »
Boris Belousov · Gerhard Neumann · Constantin Rothkopf · Jan Peters -
2015 Poster: Model-Based Relative Entropy Stochastic Search »
Abbas Abdolmaleki · Rudolf Lioutikov · Jan Peters · Nuno Lau · Luis Pualo Reis · Gerhard Neumann -
2014 Demonstration: Learning for Tactile Manipulation »
Tucker Hermans · Filipe Veiga · Janine Hölscher · Herke van Hoof · Jan Peters -
2013 Workshop: Advances in Machine Learning for Sensorimotor Control »
Thomas Walsh · Alborz Geramifard · Marc Deisenroth · Jonathan How · Jan Peters -
2013 Workshop: Planning with Information Constraints for Control, Reinforcement Learning, Computational Neuroscience, Robotics and Games. »
Hilbert J Kappen · Naftali Tishby · Jan Peters · Evangelos Theodorou · David H Wolpert · Pedro Ortega -
2013 Poster: Probabilistic Movement Primitives »
Alexandros Paraschos · Christian Daniel · Jan Peters · Gerhard Neumann -
2012 Poster: Algorithms for Learning Markov Field Policies »
Abdeslam Boularias · Oliver Kroemer · Jan Peters -
2011 Poster: A Non-Parametric Approach to Dynamic Programming »
Oliver Kroemer · Jan Peters -
2011 Oral: A Non-Parametric Approach to Dynamic Programming »
Oliver Kroemer · Jan Peters -
2010 Spotlight: Switched Latent Force Models for Movement Segmentation »
Mauricio A Alvarez · Jan Peters · Bernhard Schölkopf · Neil D Lawrence -
2010 Poster: Switched Latent Force Models for Movement Segmentation »
Mauricio A Alvarez · Jan Peters · Bernhard Schölkopf · Neil D Lawrence -
2010 Poster: Movement extraction by detecting dynamics switches and repetitions »
Silvia Chiappa · Jan Peters -
2009 Workshop: Probabilistic Approaches for Control and Robotics »
Marc Deisenroth · Hilbert J Kappen · Emo Todorov · Duy Nguyen-Tuong · Carl Edward Rasmussen · Jan Peters -
2008 Poster: Using Bayesian Dynamical Systems for Motion Template Libraries »
Silvia Chiappa · Jens Kober · Jan Peters -
2008 Poster: Policy Search for Motor Primitives in Robotics »
Jens Kober · Jan Peters -
2008 Oral: Policy Search for Motor Primitives in Robotics »
Jens Kober · Jan Peters -
2008 Poster: Local Gaussian Process Regression for Real Time Online Model Learning »
Duy Nguyen-Tuong · Matthias Seeger · Jan Peters -
2007 Workshop: Robotics Challenges for Machine Learning »
Jan Peters · Marc Toussaint -
2006 Workshop: Towards a New Reinforcement Learning? »
Jan Peters · Stefan Schaal · Drew Bagnell