Timezone: »
A majority of recent successes in deep Reinforcement Learning are based on minimization of square Bellman error. The training is often unstable due to a fast-changing target $Q$-values, and target networks are employed to stabilize by using an additional set of lagging parameters. Despite their advantages, target networks could inhibit the propagation of newly-encountered rewards which may ultimately slow down the training. In this work, we address this issue by augmenting the squared Bellman error with a functional regularizer. Unlike target networks', the regularization here is explicit which not only enables us to use up-to-date parameters but also control the regularization. This leads to a fast yet stable training method. Across a range of Atari environments, we demonstrate empirical improvements over target-network based methods in terms of both sample efficiency and performance. In summary, our approach provides a fast and stable alternative to replace the standard squared Bellman error.
Author Information
Alexandre Piche (Mila)
Joseph Marino (DeepMind)
Gian Maria Marconi (RIKEN)
Valentin Thomas (MILA)
Chris Pal (Montreal Institute for Learning Algorithms, École Polytechnique, Université de Montréal)
Mohammad Emtiyaz Khan (RIKEN)
Emtiyaz Khan (also known as Emti) is a team leader at the RIKEN center for Advanced Intelligence Project (AIP) in Tokyo where he leads the Approximate Bayesian Inference Team. He is also a visiting professor at the Tokyo University of Agriculture and Technology (TUAT). Previously, he was a postdoc and then a scientist at Ecole Polytechnique Fédérale de Lausanne (EPFL), where he also taught two large machine learning courses and received a teaching award. He finished his PhD in machine learning from University of British Columbia in 2012. The main goal of Emti’s research is to understand the principles of learning from data and use them to develop algorithms that can learn like living beings. For the past 10 years, his work has focused on developing Bayesian methods that could lead to such fundamental principles. The approximate Bayesian inference team now continues to use these principles, as well as derive new ones, to solve real-world problems.
More from the Same Authors
-
2021 : Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning »
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer -
2022 Poster: On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation »
Valentin Thomas -
2022 : Can Calibration Improve Sample Prioritization? »
Ganesh Tata · Gautham Krishna Gudur · Gopinath Chennupati · Mohammad Emtiyaz Khan -
2022 : Practical Structured Riemannian Optimization with Momentum by using Generalized Normal Coordinates »
Wu Lin · Valentin Duruisseaux · Melvin Leok · Frank Nielsen · Mohammad Emtiyaz Khan · Mark Schmidt -
2022 : Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models »
Vikram Voleti · Chris Pal · Adam Oberman -
2022 : Can Large Language Models Build Causal Graphs? »
Stephanie Long · Tibor Schuster · Alexandre Piche -
2022 : Implicit Offline Reinforcement Learning via Supervised Learning »
Alexandre Piche · Rafael Pardinas · David Vazquez · Igor Mordatch · Igor Mordatch · Chris Pal -
2022 : A General-Purpose Neural Architecture for Geospatial Systems »
Martin Weiss · Nasim Rahaman · Frederik Träuble · Francesco Locatello · Alexandre Lacoste · Yoshua Bengio · Erran Li Li · Chris Pal · Bernhard Schölkopf -
2022 : Invited Keynote 2 »
Mohammad Emtiyaz Khan · Mohammad Emtiyaz Khan -
2022 Poster: Attention-based Neural Cellular Automata »
Mattie Tesfaldet · Derek Nowrouzezahrai · Chris Pal -
2022 Poster: Neural Attentive Circuits »
Martin Weiss · Nasim Rahaman · Francesco Locatello · Chris Pal · Yoshua Bengio · Bernhard Schölkopf · Erran Li Li · Nicolas Ballas -
2022 Poster: MCVD - Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation »
Vikram Voleti · Alexia Jolicoeur-Martineau · Chris Pal -
2022 Poster: The Role of Baselines in Policy Gradient Optimization »
Jincheng Mei · Wesley Chung · Valentin Thomas · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: Dual Parameterization of Sparse Variational Gaussian Processes »
Vincent ADAM · Paul Chang · Mohammad Emtiyaz Khan · Arno Solin -
2021 Poster: Knowledge-Adaptation Priors »
Mohammad Emtiyaz Khan · Siddharth Swaroop -
2021 Poster: Iterative Amortized Policy Optimization »
Joseph Marino · Alexandre Piche · Alessandro Davide Ialongo · Yisong Yue -
2020 Poster: Measuring Systematic Generalization in Neural Proof Generation with Transformers »
Nicolas Gontier · Koustuv Sinha · Siva Reddy · Chris Pal -
2019 : Poster Session »
Pravish Sainath · Mohamed Akrout · Charles Delahunt · Nathan Kutz · Guangyu Robert Yang · Joseph Marino · L F Abbott · Nicolas Vecoven · Damien Ernst · andrew warrington · Michael Kagan · Kyunghyun Cho · Kameron Harris · Leopold Grinberg · John J. Hopfield · Dmitry Krotov · Taliah Muhammad · Erick Cobos · Edgar Walker · Jacob Reimer · Andreas Tolias · Alexander Ecker · Janaki Sheth · Yu Zhang · Maciej Wołczyk · Jacek Tabor · Szymon Maszke · Roman Pogodin · Dane Corneil · Wulfram Gerstner · Baihan Lin · Guillermo Cecchi · Jenna M Reinen · Irina Rish · Guillaume Bellec · Darjan Salaj · Anand Subramoney · Wolfgang Maass · Yueqi Wang · Ari Pakman · Jin Hyung Lee · Liam Paninski · Bryan Tripp · Colin Graber · Alex Schwing · Luke Prince · Gabriel Ocker · Michael Buice · Benjamin Lansdell · Konrad Kording · Jack Lindsey · Terrence Sejnowski · Matthew Farrell · Eric Shea-Brown · Nicolas Farrugia · Victor Nepveu · Jiwoong Im · Kristin Branson · Brian Hu · Ramakrishnan Iyer · Stefan Mihalas · Sneha Aenugu · Hananel Hazan · Sihui Dai · Tan Nguyen · Doris Tsao · Richard Baraniuk · Anima Anandkumar · Hidenori Tanaka · Aran Nayebi · Stephen Baccus · Surya Ganguli · Dean Pospisil · Eilif Muller · Jeffrey S Cheng · Gaël Varoquaux · Kamalaker Dadi · Dimitrios C Gklezakos · Rajesh PN Rao · Anand Louis · Christos Papadimitriou · Santosh Vempala · Naganand Yadati · Daniel Zdeblick · Daniela M Witten · Nicholas Roberts · Vinay Prabhu · Pierre Bellec · Poornima Ramesh · Jakob H Macke · Santiago Cadena · Guillaume Bellec · Franz Scherr · Owen Marschall · Robert Kim · Hannes Rapp · Marcio Fonseca · Oliver Armitage · Jiwoong Im · Thomas Hardcastle · Abhishek Sharma · Wyeth Bair · Adrian Valente · Shane Shang · Merav Stern · Rutuja Patil · Peter Wang · Sruthi Gorantla · Peter Stratton · Tristan Edwards · Jialin Lu · Martin Ester · Yurii Vlasov · Siavash Golkar -
2019 : Poster Session »
Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn -
2019 Poster: Real-Time Reinforcement Learning »
Simon Ramstedt · Chris Pal -
2019 Poster: Approximate Inference Turns Deep Networks into Gaussian Processes »
Mohammad Emtiyaz Khan · Alexander Immer · Ehsan Abedi · Maciej Korzepa -
2019 Poster: Practical Deep Learning with Bayesian Principles »
Kazuki Osawa · Siddharth Swaroop · Mohammad Emtiyaz Khan · Anirudh Jain · Runa Eschenhagen · Richard Turner · Rio Yokota -
2019 Tutorial: Deep Learning with Bayesian Principles »
Mohammad Emtiyaz Khan -
2018 : Poster Session 1 »
Kyle H Ambert · Brandon Araki · Xiya Cao · Sungjoon Choi · Hao(Jackson) Cui · Jonas Degrave · Yaqi Duan · Mattie Fellows · Carlos Florensa · Karan Goel · Aditya Gopalan · Ming-Xu Huang · Jonathan Hunt · Cyril Ibrahim · Brian Ichter · Maximilian Igl · Zheng Tracy Ke · Igor Kiselev · Anuj Mahajan · Arash Mehrjou · Karl Pertsch · Alexandre Piche · Nicholas Rhinehart · Thomas Ringstrom · Reazul Hasan Russel · Oleh Rybkin · Ion Stoica · Sharad Vikram · Angelina Wang · Ting-Han Wei · Abigail H Wen · I-Chen Wu · Zhengwei Wu · Linhai Xie · Dinghan Shen -
2018 : Probabilistic Planning with Sequential Monte Carlo (Alexandre Piché) »
Alexandre Piche -
2018 Poster: Manifold Structured Prediction »
Alessandro Rudi · Carlo Ciliberto · Gian Maria Marconi · Lorenzo Rosasco -
2018 Poster: A General Method for Amortizing Variational Filtering »
Joseph Marino · Milan Cvitkovic · Yisong Yue -
2017 : Poster session + Coffee break »
Mikael Kågebäck · Igor Melnyk · Amir-Hossein Karimi · Gino Brunner · Ershad Banijamali · Chris Donahue · Jake Zhao · Giambattista Parascandolo · Valentin Thomas · Abhishek Kumar · Chris Burgess · Amanda Nilsson · Maria Larsson · Cian Eastwood · Momchil Peychev -
2015 Poster: Kullback-Leibler Proximal Variational Inference »
Mohammad Emtiyaz Khan · Pierre Baque · François Fleuret · Pascal Fua -
2014 Poster: Decoupled Variational Gaussian Inference »
Mohammad Emtiyaz Khan -
2012 Poster: Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression »
Mohammad Emtiyaz Khan · Shakir Mohamed · Kevin Murphy -
2010 Poster: Variational bounds for mixed-data factor analysis »
Mohammad Emtiyaz Khan · Benjamin Marlin · Guillaume Bouchard · Kevin Murphy -
2009 Oral: Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models »
Baback Moghaddam · Benjamin Marlin · Mohammad Emtiyaz Khan · Kevin Murphy -
2009 Poster: Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models »
Baback Moghaddam · Benjamin Marlin · Mohammad Emtiyaz Khan · Kevin Murphy