Timezone: »

 
Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
Bogdan Mazoure · Ilya Kostrikov · Ofir Nachum · Jonathan Tompson

Reinforcement learning (RL) agents are widely used for solving complex sequential decision-making tasks, but still exhibit difficulty in generalizing to scenarios not seen during training. While prior online approaches demonstrated that using additional signals beyond the reward function can lead to better generalization capabilities in RL agents, i.e. using self-supervised learning (SSL), they struggle in the offline RL setting, i.e. learning from a static dataset. We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations. We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior, where we quantify this similarity using generalized value functions. We show that GSF is general enough to recover existing SSL objectives while also improving zero-shot generalization performance on a complex offline RL benchmark, offline Procgen.

Author Information

Bogdan Mazoure (McGill University)

Ph.D. student at MILA / McGill University, supervised by Doina Precup and Devon Hjelm. Interested in reinforcement learning, representation learning, mathematical statistics and density estimation.

Ilya Kostrikov (UC Berkeley)
Ofir Nachum (Google Brain)
Jonathan Tompson (Google Brain)

More from the Same Authors

  • 2021 : Implicit Behavioral Cloning »
    Pete Florence · Corey Lynch · Andy Zeng · Oscar Ramirez · Ayzaan Wahid · Laura Downs · Adrian Wong · Igor Mordatch · Jonathan Tompson
  • 2021 : Offline Reinforcement Learning with Implicit Q-Learning »
    Ilya Kostrikov · Ashvin Nair · Sergey Levine
  • 2021 : TRAIL: Near-Optimal Imitation Learning with Suboptimal Data »
    Mengjiao (Sherry) Yang · Sergey Levine · Ofir Nachum
  • 2021 : Why so pessimistic? Estimating uncertainties for offline rl through ensembles, and why their independence matters »
    Kamyar Ghasemipour · Shixiang (Shane) Gu · Ofir Nachum
  • 2022 : A Mixture-of-Expert Approach to RL-based Dialogue Management »
    Yinlam Chow · Azamat Tulepbergenov · Ofir Nachum · Dhawal Gupta · Moonkyung Ryu · Mohammad Ghavamzadeh · Craig Boutilier
  • 2022 : Multi-Environment Pretraining Enables Transfer to Action Limited Datasets »
    David Venuto · Mengjiao (Sherry) Yang · Pieter Abbeel · Doina Precup · Igor Mordatch · Ofir Nachum
  • 2022 : Skill Acquisition by Instruction Augmentation on Offline Datasets »
    Ted Xiao · Harris Chan · Pierre Sermanet · Ayzaan Wahid · Anthony Brohan · Karol Hausman · Sergey Levine · Jonathan Tompson
  • 2022 : Interactive Language: Talking to Robots in Real Time »
    Corey Lynch · Pete Florence · Jonathan Tompson · Ayzaan Wahid · Tianli Ding · James Betker · Robert Baruch · Travis Armstrong
  • 2022 : Skill Acquisition by Instruction Augmentation on Offline Datasets »
    Ted Xiao · Harris Chan · Pierre Sermanet · Ayzaan Wahid · Anthony Brohan · Karol Hausman · Sergey Levine · Jonathan Tompson
  • 2022 : Contrastive Value Learning: Implicit Models for Simple Offline RL »
    Bogdan Mazoure · Benjamin Eysenbach · Ofir Nachum · Jonathan Tompson
  • 2022 : Interactive Language: Talking to Robots in Real Time »
    Corey Lynch · Pete Florence · Jonathan Tompson · Ayzaan Wahid · Tianli Ding · James Betker · Robert Baruch · Travis Armstrong
  • 2022 : Skill Acquisition by Instruction Augmentation on Offline Datasets »
    Ted Xiao · Harris Chan · Pierre Sermanet · Ayzaan Wahid · Anthony Brohan · Karol Hausman · Sergey Levine · Jonathan Tompson
  • 2022 Workshop: 5th Robot Learning Workshop: Trustworthy Robotics »
    Alex Bewley · Roberto Calandra · Anca Dragan · Igor Gilitschenski · Emily Hannigan · Masha Itkina · Hamidreza Kasaei · Jens Kober · Danica Kragic · Nathan Lambert · Julien PEREZ · Fabio Ramos · Ransalu Senanayake · Jonathan Tompson · Vincent Vanhoucke · Markus Wulfmeier
  • 2022 Workshop: Foundation Models for Decision Making »
    Mengjiao (Sherry) Yang · Yilun Du · Jack Parker-Holder · Siddharth Karamcheti · Igor Mordatch · Shixiang (Shane) Gu · Ofir Nachum
  • 2022 Poster: Oracle Inequalities for Model Selection in Offline Reinforcement Learning »
    Jonathan N Lee · George Tucker · Ofir Nachum · Bo Dai · Emma Brunskill
  • 2022 Poster: Chain of Thought Imitation with Procedure Cloning »
    Mengjiao (Sherry) Yang · Dale Schuurmans · Pieter Abbeel · Ofir Nachum
  • 2022 Poster: Multi-Game Decision Transformers »
    Kuang-Huei Lee · Ofir Nachum · Mengjiao (Sherry) Yang · Lisa Lee · Daniel Freeman · Sergio Guadarrama · Ian Fischer · Winnie Xu · Eric Jang · Henryk Michalewski · Igor Mordatch
  • 2022 Poster: Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters »
    Kamyar Ghasemipour · Shixiang (Shane) Gu · Ofir Nachum
  • 2022 Poster: Improving Zero-Shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions »
    Bogdan Mazoure · Ilya Kostrikov · Ofir Nachum · Jonathan Tompson
  • 2021 : Implicit Behavioral Cloning Q&A »
    Pete Florence · Corey Lynch · Andy Zeng · Oscar Ramirez · Ayzaan Wahid · Laura Downs · Adrian Wong · Igor Mordatch · Jonathan Tompson
  • 2021 : Implicit Behavioral Cloning »
    Pete Florence · Corey Lynch · Andy Zeng · Oscar Ramirez · Ayzaan Wahid · Laura Downs · Adrian Wong · Igor Mordatch · Jonathan Tompson
  • 2021 Poster: Automatic Data Augmentation for Generalization in Reinforcement Learning »
    Roberta Raileanu · Maxwell Goldstein · Denis Yarats · Ilya Kostrikov · Rob Fergus
  • 2020 Poster: Deep Reinforcement and InfoMax Learning »
    Bogdan Mazoure · Remi Tachet des Combes · Thang Long Doan · Philip Bachman · R Devon Hjelm
  • 2019 : Poster and Coffee Break 2 »
    Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall
  • 2019 : Poster Presentations »
    Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · Raphaël Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange
  • 2019 : Poster Session »
    Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn
  • 2019 : Poster Spotlight 2 »
    Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare
  • 2019 : Contributed Talks »
    Kevin Lu · Matthew Hausknecht · Ofir Nachum
  • 2018 Poster: Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning »
    Supasorn Suwajanakorn · Noah Snavely · Jonathan Tompson · Mohammad Norouzi
  • 2018 Oral: Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning »
    Supasorn Suwajanakorn · Noah Snavely · Jonathan Tompson · Mohammad Norouzi
  • 2017 Poster: Bridging the Gap Between Value and Policy Based Reinforcement Learning »
    Ofir Nachum · Mohammad Norouzi · Kelvin Xu · Dale Schuurmans