Timezone: »
Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing the policy with the actions contained in the dataset. Built on pre-existing RL algorithms, modifications to make an RL algorithm work offline comes at the cost of additional complexity. Offline RL algorithms introduce new hyperparameters and often leverage secondary components such as generative models, while adjusting the underlying RL algorithm. In this paper we aim to make a deep RL algorithm work while making minimal changes. We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data. The resulting algorithm is a simple to implement and tune baseline, while more than halving the overall run time by removing the additional computational overheads of previous methods.
Author Information
Scott Fujimoto (McGill University)
Shixiang (Shane) Gu (Google Brain, University of Cambridge)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: A Minimalist Approach to Offline Reinforcement Learning »
Dates n/a. Room
More from the Same Authors
-
2021 : IL-flOw: Imitation Learning from Observation using Normalizing Flows »
Wei-Di Chang · Juan Camilo Gamboa Higuera · Scott Fujimoto · David Meger · Gregory Dudek -
2021 : Distributional Decision Transformer for Offline Hindsight Information Matching »
Hiroki Furuta · Yutaka Matsuo · Shixiang (Shane) Gu -
2021 Poster: Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning »
Hiroki Furuta · Tadashi Kozuno · Tatsuya Matsushima · Yutaka Matsuo · Shixiang (Shane) Gu -
2020 Poster: An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay »
Scott Fujimoto · David Meger · Doina Precup -
2019 : Poster Session »
Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn -
2018 Poster: Data-Efficient Hierarchical Reinforcement Learning »
Ofir Nachum · Shixiang (Shane) Gu · Honglak Lee · Sergey Levine -
2018 Poster: Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation »
Edward Smith · Scott Fujimoto · David Meger -
2017 Poster: Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning »
Shixiang (Shane) Gu · Timothy Lillicrap · Richard Turner · Zoubin Ghahramani · Bernhard Schölkopf · Sergey Levine -
2015 Poster: Particle Gibbs for Infinite Hidden Markov Models »
Nilesh Tripuraneni · Shixiang (Shane) Gu · Hong Ge · Zoubin Ghahramani -
2015 Poster: Neural Adaptive Sequential Monte Carlo »
Shixiang (Shane) Gu · Zoubin Ghahramani · Richard Turner