Timezone: »
With the advent of large datasets, offline reinforcement learning is a promising framework for learning good decision-making policies without the need to interact with the real environment.However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive.In this paper, we introduce Optimal Transport Reward labeling (OTR), an algorithm that can assign rewards to offline trajectories, with a few high-quality demonstrations. OTR's key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. OTR is easy to implement and computationally efficient. On D4RL benchmarks, we show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.
Author Information
Yicheng Luo (University College London)
zhengyao Jiang (University College London)
Samuel Cohen (University College London)
Edward Grefenstette (Cohere & University College London)
Marc Deisenroth (University College London)

Professor Marc Deisenroth is the DeepMind Chair in Artificial Intelligence at University College London and the Deputy Director of UCL's Centre for Artificial Intelligence. He also holds a visiting faculty position at the University of Johannesburg and Imperial College London. Marc's research interests center around data-efficient machine learning, probabilistic modeling and autonomous decision making. Marc was Program Chair of EWRL 2012, Workshops Chair of RSS 2013, EXPO-Co-Chair of ICML 2020, and Tutorials Co-Chair of NeurIPS 2021. In 2019, Marc co-organized the Machine Learning Summer School in London. He received Paper Awards at ICRA 2014, ICCAS 2016, and ICML 2020. He is co-author of the book [Mathematics for Machine Learning](https://mml-book.github.io) published by Cambridge University Press (2020).
More from the Same Authors
-
2021 : MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research »
Mikayel Samvelyan · Robert Kirk · Vitaly Kurin · Jack Parker-Holder · Minqi Jiang · Eric Hambro · Fabio Petroni · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel -
2021 : Cross-Domain Imitation Learning via Optimal Transport »
Arnaud Fickinger · Samuel Cohen · Stuart Russell · Brandon Amos -
2021 : Grounding Aleatoric Uncertainty in Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2021 : Imitation Learning from Pixel Observations for Continuous Control »
Samuel Cohen · Brandon Amos · Marc Deisenroth · Mikael Henaff · Eugene Vinitsky · Denis Yarats -
2021 : That Escalated Quickly: Compounding Complexity by Editing Levels at the Frontier of Agent Capabilities »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 : Graph Backup: Data Efficient Backup Exploiting Markovian Data »
zhengyao Jiang · Tianjun Zhang · Robert Kirk · Tim Rocktäschel · Edward Grefenstette -
2021 : Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay »
Iryna Korshunova · Minqi Jiang · Jack Parker-Holder · Tim Rocktäschel · Edward Grefenstette -
2021 : On Combining Expert Demonstrations in Imitation Learning via Optimal Transport »
ilana sebag · Samuel Cohen · Marc Deisenroth -
2021 : Sliced Multi-Marginal Optimal Transport »
Samuel Cohen · Alexander Terenin · Yannik Pitcan · Brandon Amos · Marc Deisenroth · Senanayak Sesh Kumar Karri -
2022 : Actually Sparse Variational Gaussian Processes »
Jake Cunningham · So Takao · Mark van der Wilk · Marc Deisenroth -
2022 : Meta Optimal Transport »
Brandon Amos · Samuel Cohen · Giulia Luise · Ievgen Redko -
2022 : Short-term Prediction and Filtering of Solar Power Using State-Space Gaussian Processes »
So Takao · Sean Nassimiha · Peter Dudfield · Jack Kelly · Marc Deisenroth -
2022 : Efficient Planning in a Compact Latent Action Space »
zhengyao Jiang · Tianjun Zhang · Michael Janner · Yueying (Lisa) Li · Tim Rocktäschel · Edward Grefenstette · Yuandong Tian -
2023 Poster: Thin and deep Gaussian processes »
Daniel Augusto de Souza · Alexander Nikitin · ST John · Magnus Ross · Mauricio A Álvarez · Marc Deisenroth · João Paulo Gomes · Diego Mesquita · César Lincoln Mattos -
2023 Poster: The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs »
Laura Ruis · Akbir Khan · Stella Biderman · Sara Hooker · Tim Rocktäschel · Edward Grefenstette -
2023 Poster: ChessGPT: Bridging Policy Learning and Language Modeling »
Xidong Feng · Yicheng Luo · Ziyan Wang · Hongrui Tang · Mengyue Yang · Kun Shao · David Mguni · Yali Du · Jun Wang -
2023 Poster: Cauchy–Schwarz Regularized Autoencoder »
Linh Tran · Maja Pantic · Marc Deisenroth -
2022 : Fair Synthetic Data Does not Necessarily Lead to Fair Models »
Yam Eitan · Nathan Cavaglione · Michael Arbel · Samuel Cohen -
2022 Workshop: LaReL: Language and Reinforcement Learning »
Laetitia Teodorescu · Laura Ruis · Tristan Karch · Cédric Colas · Paul Barde · Jelena Luketina · Athul Jacob · Pratyusha Sharma · Edward Grefenstette · Jacob Andreas · Marc-Alexandre Côté -
2022 Poster: Learning General World Models in a Handful of Reward-Free Deployments »
Yingchen Xu · Jack Parker-Holder · Aldo Pacchiano · Philip Ball · Oleh Rybkin · S Roberts · Tim Rocktäschel · Edward Grefenstette -
2022 Poster: Grounding Aleatoric Uncertainty for Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Küttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2022 Poster: Improving Policy Learning via Language Dynamics Distillation »
Victor Zhong · Jesse Mu · Luke Zettlemoyer · Edward Grefenstette · Tim Rocktäschel -
2022 Poster: Improving Intrinsic Exploration with Language Abstractions »
Jesse Mu · Victor Zhong · Roberta Raileanu · Minqi Jiang · Noah Goodman · Tim Rocktäschel · Edward Grefenstette -
2021 : The NetHack Challenge + Q&A »
Eric Hambro · Sharada Mohanty · Dipam Chakrabroty · Edward Grefenstette · Minqi Jiang · Robert Kirk · Vitaly Kurin · Heinrich Kuttler · Vegard Mella · Nantas Nardelli · Jack Parker-Holder · Roberta Raileanu · Tim Rocktäschel · Danielle Rothermel · Mikayel Samvelyan -
2021 Poster: Replay-Guided Adversarial Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 Poster: Vector-valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels »
Michael Hutchinson · Alexander Terenin · Viacheslav Borovitskiy · So Takao · Yee Teh · Marc Deisenroth -
2020 : GENNI: Visualising the Geometry of Equivalences for Neural Network Identifiability »
Arinbjörn Kolbeinsson · Nicholas Jennings · Marc Deisenroth · Daniel Lengyel · Janith Petangoda · Michalis Lazarou · Kate Highnam · John IF Falk -
2020 Poster: Matérn Gaussian Processes on Riemannian Manifolds »
Viacheslav Borovitskiy · Alexander Terenin · Peter Mostowsky · Marc Deisenroth -
2020 Poster: The NetHack Learning Environment »
Heinrich Küttler · Nantas Nardelli · Alexander Miller · Roberta Raileanu · Marco Selvatici · Edward Grefenstette · Tim Rocktäschel -
2020 Session: Orals & Spotlights Track 25: Probabilistic Models/Statistics »
Marc Deisenroth · Matthew D. Hoffman -
2020 Poster: Probabilistic Active Meta-Learning »
Jean Kaddour · Steindor Saemundsson · Marc Deisenroth -
2020 Tutorial: (Track1) There and Back Again: A Tale of Slopes and Expectations Q&A »
Marc Deisenroth · Cheng Soon Ong -
2020 : Discussion Panel: Hugo Larochelle, Finale Doshi-Velez, Devi Parikh, Marc Deisenroth, Julien Mairal, Katja Hofmann, Phillip Isola, and Michael Bowling »
Hugo Larochelle · Finale Doshi-Velez · Marc Deisenroth · Devi Parikh · Julien Mairal · Katja Hofmann · Phillip Isola · Michael Bowling -
2020 Tutorial: (Track1) There and Back Again: A Tale of Slopes and Expectations »
Marc Deisenroth · Cheng Soon Ong -
2019 : Invited Talk - Marc Deisenroth »
Marc Deisenroth -
2018 Poster: Gaussian Process Conditional Density Estimation »
Vincent Dutordoir · Hugh Salimbeni · James Hensman · Marc Deisenroth -
2018 Poster: Maximizing acquisition functions for Bayesian optimization »
James Wilson · Frank Hutter · Marc Deisenroth -
2018 Poster: Orthogonally Decoupled Variational Gaussian Processes »
Hugh Salimbeni · Ching-An Cheng · Byron Boots · Marc Deisenroth -
2017 Poster: Doubly Stochastic Variational Inference for Deep Gaussian Processes »
Hugh Salimbeni · Marc Deisenroth -
2017 Spotlight: Doubly Stochastic Variational Inference for Deep Gaussian Processes »
Hugh Salimbeni · Marc Deisenroth -
2017 Poster: Identification of Gaussian Process State Space Models »
Stefanos Eleftheriadis · Tom Nicholson · Marc Deisenroth · James Hensman -
2015 : Applications of Bayesian Optimization to Systems »
Marc Deisenroth -
2015 Poster: Teaching Machines to Read and Comprehend »
Karl Moritz Hermann · Tomas Kocisky · Edward Grefenstette · Lasse Espeholt · Will Kay · Mustafa Suleyman · Phil Blunsom -
2015 Poster: Learning to Transduce with Unbounded Memory »
Edward Grefenstette · Karl Moritz Hermann · Mustafa Suleyman · Phil Blunsom -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2013 Workshop: Advances in Machine Learning for Sensorimotor Control »
Thomas Walsh · Alborz Geramifard · Marc Deisenroth · Jonathan How · Jan Peters -
2012 Poster: Expectation Propagation in Gaussian Process Dynamical Systems »
Marc Deisenroth · Shakir Mohamed -
2009 Workshop: Probabilistic Approaches for Control and Robotics »
Marc Deisenroth · Hilbert J Kappen · Emo Todorov · Duy Nguyen-Tuong · Carl Edward Rasmussen · Jan Peters