Timezone: »
One of the key behavioral characteristics used in neuroscience to determine whether the subject of study---be it a rodent or a human---exhibits model-based learning is effective adaptation to local changes in the environment. In reinforcement learning, however, recent work has shown that modern deep model-based reinforcement-learning (MBRL) methods adapt poorly to such changes. An explanation for this mismatch is that MBRL methods are typically designed with sample-efficiency on a single task in mind and the requirements for effective adaptation are substantially higher, both in terms of the learned world model and the planning routine. One particularly challenging requirement is that the learned world model has to be sufficiently accurate throughout relevant parts of the state-space. This is challenging for deep-learning-based world models due to catastrophic forgetting. And while a replay buffer can mitigate the effects of catastrophic forgetting, the traditional first-in-first-out replay buffer precludes effective adaptation due to maintaining stale data. In this work, we show that a conceptually simple variation of this traditional replay buffer is able to overcome this limitation. By removing only samples from the buffer from the local neighbourhood of the newly observed samples, deep world models can be built that maintain their accuracy across the state-space, while also being able to effectively adapt to changes in the reward function. We demonstrate this by applying our replay-buffer variation to the classical Dyna method, as well as to recent methods such as PlaNet and DreamerV2, showing for the first time that deep model-based methods are able to achieve effective adaptation.
Author Information
Ali Rahimi-Kalahroudi (Mila - Université de Montréal)
Janarthanan Rajendran (Mila)
Ida Momennejad (Microsoft Research)
Harm Van Seijen (Microsoft Research)
Sarath Chandar (Mila / École Polytechnique de Montréal)
More from the Same Authors
-
2021 : IIRC: Incremental Implicitly-Refined Classification »
Mohamed Abdelsalam · Mojtaba Faramarzi · Shagun Sodhani · Sarath Chandar -
2022 : Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information »
Riashat Islam · Manan Tomar · Alex Lamb · Hongyu Zang · Yonathan Efroni · Dipendra Misra · Aniket Didolkar · Xin Li · Harm Van Seijen · Remi Tachet des Combes · John Langford -
2022 : Imitating Human Behaviour with Diffusion Models »
Tim Pearce · Tabish Rashid · Anssi Kanervisto · David Bignell · Mingfei Sun · Raluca Georgescu · Sergio Valcarcel Macua · Shan Zheng Tan · Ida Momennejad · Katja Hofmann · Sam Devlin -
2022 : PatchBlender: A Motion Prior for Video Transformers »
Gabriele Prato · Yale Song · Janarthanan Rajendran · R Devon Hjelm · Neel Joshi · Sarath Chandar -
2022 : Panel Discussion: Opportunities and Challenges »
Kenneth Norman · Janice Chen · Samuel J Gershman · Albert Gu · Sepp Hochreiter · Ida Momennejad · Hava Siegelmann · Sainbayar Sukhbaatar -
2022 : Ida Mommenejad: "Neuro-inspired Memory in Reinforcement Learning: State of the art, Challenges, and Opportunities" »
Ida Momennejad -
2022 : Attention in Task-sets, Planning, and the Prefrontal Cortex »
Ida Momennejad -
2022 Poster: Interaction-Grounded Learning with Action-Inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan J Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2020 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Coline Devin · Misha Laskin · Kimin Lee · Janarthanan Rajendran · Vivek Veeriah -
2020 Poster: Meta-Learning Requires Meta-Augmentation »
Janarthanan Rajendran · Alexander Irpan · Eric Jang -
2020 Poster: The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning »
Harm Van Seijen · Hadi Nekoei · Evan Racah · Sarath Chandar -
2019 Poster: Discovery of Useful Questions as Auxiliary Tasks »
Vivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2019 Poster: Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning »
Harm Van Seijen · Mehdi Fatemi · Arash Tavakoli -
2019 Oral: Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning »
Harm Van Seijen · Mehdi Fatemi · Arash Tavakoli -
2017 Poster: Hybrid Reward Architecture for Reinforcement Learning »
Harm Van Seijen · Mehdi Fatemi · Romain Laroche · Joshua Romoff · Tavian Barnes · Jeffrey Tsang