Timezone: »
The common paradigm in reinforcement learning (RL) assumes that an agent frequently interacts with the environment and learns using its own collected experience. This mode of operation is prohibitive for many complex real-world problems, where repeatedly collecting diverse data is expensive (e.g., robotics or educational agents) and/or dangerous (e.g., healthcare). Alternatively, Offline RL focuses on training agents with logged data in an offline fashion with no further environment interaction. Offline RL promises to bring forward a data-driven RL paradigm and carries the potential to scale up end-to-end learning approaches to real-world decision making tasks such as robotics, recommendation systems, dialogue generation, autonomous driving, healthcare systems and safety-critical applications. Recently, successful deep RL algorithms have been adapted to the offline RL setting and demonstrated a potential for success in a number of domains, however, significant algorithmic and practical challenges remain to be addressed. The goal of this workshop is to bring attention to offline RL, both from within and from outside the RL community discuss algorithmic challenges that need to be addressed, discuss potential real-world applications, discuss limitations and challenges, and come up with concrete problem statements and evaluation protocols, inspired from real-world applications, for the research community to work on.
For details on submission please visit: https://offline-rl-neurips.github.io/ (Submission deadline: October 9, 11:59 pm PT)
Speakers:
Emma Brunskill (Stanford)
Finale Doshi-Velez (Harvard)
John Langford (Microsoft Research)
Nan Jiang (UIUC)
Brandyn White (Waymo Research)
Nando de Freitas (DeepMind)
Sat 8:50 a.m. - 9:00 a.m.
|
Introduction
|
Aviral Kumar, George Tucker, Rishabh Agarwal |
Sat 9:00 a.m. - 9:30 a.m.
|
Offline RL
(Talk)
Video
|
Nando de Freitas |
Sat 9:30 a.m. - 9:40 a.m.
|
Q&A w/ Nando de Freitas
(Q&A)
|
|
Sat 9:40 a.m. - 9:50 a.m.
|
Contributed Talk 1: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
(Talk)
»
Video
Aayam Shrestha (Oregon State University)*; Stefan Lee (Oregon State University); Prasad Tadepalli (Oregon State University); Alan Fern (Oregon State University) |
Aayam Shrestha |
Sat 9:50 a.m. - 10:00 a.m.
|
Contributed Talk 2: Chaining Behaviors from Data with Model-Free Reinforcement Learning
(Talk)
Video
|
Avi Singh |
Sat 10:00 a.m. - 10:10 a.m.
|
Contributed Talk 3: Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets
(Talk)
Video
|
Seunghyun Lee, Younggyo Seo, Kimin Lee |
Sat 10:10 a.m. - 10:20 a.m.
|
Contributed Talk 4: Addressing Extrapolation Error in Deep Offline Reinforcement Learning
(Talk)
|
Caglar Gulcehre |
Sat 10:20 a.m. - 10:30 a.m.
|
Q/A for Contributed Talks 1
(Q/A)
|
|
Sat 10:30 a.m. - 11:20 a.m.
|
Poster Session 1 (gather.town) (Poster Session) | |
Sat 11:20 a.m. - 11:50 a.m.
|
Causal Structure Discovery in RL
(Talk)
|
John Langford |
Sat 11:50 a.m. - 12:00 p.m.
|
Q&A w/ John Langford
(Q&A)
|
|
Sat 12:00 p.m. - 1:00 p.m.
|
Panel
|
Emma Brunskill, Nan Jiang, Nando de Freitas, Finale Doshi-Velez, Sergey Levine, John Langford, Lihong Li, George Tucker, Rishabh Agarwal, Aviral Kumar |
Sat 1:10 p.m. - 1:40 p.m.
|
Learning a Multi-Agent Simulator from Offline Demonstrations
(Talk)
Video
|
Brandyn White, Brandyn White |
Sat 1:40 p.m. - 1:50 p.m.
|
Q&A w/ Brandyn White
(Q&A)
|
|
Sat 1:50 p.m. - 2:20 p.m.
|
Towards Reliable Validation and Evaluation for Offline RL
(Talk)
Video
|
Nan Jiang |
Sat 2:20 p.m. - 2:30 p.m.
|
Q&A w/ Nan Jiang
(Q&A)
|
|
Sat 2:30 p.m. - 2:40 p.m.
|
Contributed Talk 5: Latent Action Space for Offline Reinforcement Learning
(Talk)
Video
|
Wenxuan Zhou |
Sat 2:40 p.m. - 2:50 p.m.
|
Contributed Talk 6: What are the Statistical Limits for Batch RL with Linear Function Approximation?
(Talk)
Video
|
Ruosong Wang |
Sat 2:50 p.m. - 3:00 p.m.
|
Contributed Talk 7: Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning
(Talk)
Video
|
Sam Daulton, Hong Namkoong |
Sat 3:00 p.m. - 3:10 p.m.
|
Contributed Talk 8: Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation
(Talk)
Video
|
Diksha Garg |
Sat 3:10 p.m. - 3:20 p.m.
|
Q/A for Contributed Talks 2
(Q&A)
|
|
Sat 3:20 p.m. - 4:30 p.m.
|
Poster Session 2 (gather.town) (Poster Session) | |
Sat 4:30 p.m. - 5:00 p.m.
|
Counterfactuals and Offline RL
(Talk)
|
Emma Brunskill |
Sat 5:00 p.m. - 5:10 p.m.
|
Q&A w/ Emma Brunskill
(Q&A)
|
|
Sat 5:10 p.m. - 5:40 p.m.
|
Batch RL Models Built for Validation
(Talk)
Video
|
Finale Doshi-Velez |
Sat 5:40 p.m. - 5:50 p.m.
|
Q&A w/ Finale Doshi-Velez
(Q&A)
|
|
Sat 5:50 p.m. - 6:00 p.m.
|
Closing Remarks
|
Author Information
Aviral Kumar (UC Berkeley)
Rishabh Agarwal (Google Research, Brain Team)
I am a research associate in the Google Brain team in Montréal. My research interests mainly revolve around Deep Reinforcement Learning (RL), often with the goal of making RL methods suitable for real-world problems.
George Tucker (Google Brain)
Lihong Li (Google Brain)
Doina Precup (McGill University / Mila / DeepMind Montreal)
Aviral Kumar (UC Berkeley)
More from the Same Authors
-
2020 Workshop: Biological and Artificial Reinforcement Learning »
Raymond Chua · Feryal Behbahani · Julie J Lee · Sara Zannone · Rui Ponte Costa · Blake Richards · Ida Momennejad · Doina Precup -
2020 Poster: Model Inversion Networks for Model-Based Optimization »
Aviral Kumar · Sergey Levine -
2020 Poster: Reward Propagation Using Graph Convolutional Networks »
Martin Klissarov · Doina Precup -
2020 Spotlight: Reward Propagation Using Graph Convolutional Networks »
Martin Klissarov · Doina Precup -
2020 Poster: RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning »
Caglar Gulcehre · Ziyu Wang · Alexander Novikov · Thomas Paine · Sergio Gómez · Konrad Zolna · Rishabh Agarwal · Josh Merel · Daniel Mankowitz · Cosmin Paduraru · Gabriel Dulac-Arnold · Jerry Li · Mohammad Norouzi · Matthew Hoffman · Nicolas Heess · Nando de Freitas -
2020 Poster: DisARM: An Antithetic Gradient Estimator for Binary Latent Variables »
Zhe Dong · Andriy Mnih · George Tucker -
2020 Spotlight: DisARM: An Antithetic Gradient Estimator for Binary Latent Variables »
Zhe Dong · Andriy Mnih · George Tucker -
2020 Poster: Conservative Q-Learning for Offline Reinforcement Learning »
Aviral Kumar · Aurick Zhou · George Tucker · Sergey Levine -
2020 Tutorial: (Track3) Offline Reinforcement Learning: From Algorithm Design to Practical Applications Q&A »
Sergey Levine · Aviral Kumar -
2020 Poster: One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL »
Saurabh Kumar · Aviral Kumar · Sergey Levine · Chelsea Finn -
2020 Poster: An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay »
Scott Fujimoto · David Meger · Doina Precup -
2020 Poster: Forethought and Hindsight in Credit Assignment »
Veronica Chelu · Doina Precup · Hado van Hasselt -
2020 Poster: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction »
Aviral Kumar · Abhishek Gupta · Sergey Levine -
2020 Spotlight: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction »
Aviral Kumar · Abhishek Gupta · Sergey Levine -
2020 Tutorial: (Track3) Offline Reinforcement Learning: From Algorithm Design to Practical Applications »
Sergey Levine · Aviral Kumar -
2019 Workshop: Biological and Artificial Reinforcement Learning »
Raymond Chua · Sara Zannone · Feryal Behbahani · Rui Ponte Costa · Claudia Clopath · Blake Richards · Doina Precup -
2019 Poster: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction »
Aviral Kumar · Justin Fu · George Tucker · Sergey Levine -
2019 Poster: Graph Normalizing Flows »
Jenny Liu · Aviral Kumar · Jimmy Ba · Jamie Kiros · Kevin Swersky -
2019 Poster: Energy-Inspired Models: Learning with Sampler-Induced Distributions »
John Lawson · George Tucker · Bo Dai · Rajesh Ranganath -
2019 Poster: Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse »
James Lucas · George Tucker · Roger Grosse · Mohammad Norouzi -
2019 Poster: Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks »
Sitao Luan · Mingde Zhao · Xiao-Wen Chang · Doina Precup -
2018 Poster: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion »
Jacob Buckman · Danijar Hafner · George Tucker · Eugene Brevdo · Honglak Lee -
2018 Oral: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion »
Jacob Buckman · Danijar Hafner · George Tucker · Eugene Brevdo · Honglak Lee -
2018 Poster: Temporal Regularization for Markov Decision Process »
Pierre Thodoroff · Audrey Durand · Joelle Pineau · Doina Precup -
2018 Poster: Learning Safe Policies with Expert Guidance »
Jessie Huang · Fa Wu · Doina Precup · Yang Cai -
2017 Workshop: Hierarchical Reinforcement Learning »
Andrew G Barto · Doina Precup · Shie Mannor · Tom Schaul · Roy Fox · Carlos Florensa -
2017 Poster: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha Sohl-Dickstein -
2017 Oral: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha Sohl-Dickstein -
2017 Poster: Filtering Variational Objectives »
Chris Maddison · John Lawson · George Tucker · Nicolas Heess · Mohammad Norouzi · Andriy Mnih · Arnaud Doucet · Yee Teh -
2016 Workshop: The Future of Interactive Machine Learning »
Kory Mathewson · Kaushik Subramanian · Mark Ho · Robert Loftin · Joseph L Austerweil · Anna Harutyunyan · Doina Precup · Layla El Asri · Matthew Gombolay · Jerry Zhu · Sonia Chernova · Charles Isbell · Patrick M Pilarski · Weng-Keen Wong · Manuela Veloso · Julie A Shah · Matthew Taylor · Brenna Argall · Michael Littman -
2015 Poster: Data Generation as Sequential Decision Making »
Philip Bachman · Doina Precup -
2015 Spotlight: Data Generation as Sequential Decision Making »
Philip Bachman · Doina Precup -
2015 Poster: Basis refinement strategies for linear value function approximation in MDPs »
Gheorghe Comanici · Doina Precup · Prakash Panangaden -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2014 Poster: Optimizing Energy Production Using Policy Search and Predictive State Representations »
Yuri Grinberg · Doina Precup · Michel Gendreau -
2014 Poster: Learning with Pseudo-Ensembles »
Philip Bachman · Ouais Alsharif · Doina Precup -
2014 Spotlight: Optimizing Energy Production Using Policy Search and Predictive State Representations »
Yuri Grinberg · Doina Precup · Michel Gendreau -
2013 Poster: Learning from Limited Demonstrations »
Beomjoon Kim · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2013 Poster: Bellman Error Based Feature Generation using Random Projections on Sparse Spaces »
Mahdi Milani Fard · Yuri Grinberg · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2013 Spotlight: Learning from Limited Demonstrations »
Beomjoon Kim · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2012 Poster: Value Pursuit Iteration »
Amir-massoud Farahmand · Doina Precup -
2012 Poster: On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau -
2011 Poster: Reinforcement Learning using Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau -
2009 Poster: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2009 Spotlight: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2008 Poster: Bounding Performance Loss in Approximate MDP Homomorphisms »
Doina Precup · Jonathan Taylor Taylor · Prakash Panangaden