Timezone: »
The goal of offline reinforcement learning (RL) is to learn near-optimal policies from static logged datasets, thus sidestepping expensive online interactions. Behavioral cloning (BC) provides a straightforward solution to offline RL by mimicking offline trajectories via supervised learning. Recent advances~\cite{chen2021decision, janner2021offline, emmons2021rvs} have shown that by conditioning on desired future returns, BC can perform competitively to their value-based counterparts, while enjoying much more simplicity and training stability. However, the distribution of returns in the offline dataset can be arbitrarily skewed and suboptimal, which poses a unique challenge for conditioning BC on expert returns at test time. We propose ConserWeightive Behavioral Cloning (\name), a simple and effective method for improving the performance of conditional BC for offline RL with two key components: trajectory weighting and conservative regularization. Trajectory weighting addresses the bias-variance tradeoff in conditional BC and provides a principled mechanism to learn from both low return trajectories (typically plentiful) and high return trajectories (typically few). Further, we analyze the notion of conservatism in existing BC methods, and propose a novel conservative regularizer that explicitly encourages the policy to stay close to the data distribution. The regularizer helps achieve more reliable performance, and removes the need for ad-hoc tuning of the conditioning value during evaluation. We instantiate \name{} in the context of Reinforcement Learning via Supervised Learning (RvS)~\cite{emmons2021rvs} and Decision Transformer (DT)~\citep{chen2021decision}, and empirically show that it significantly boosts the performance and stability of prior methods on various offline RL benchmarks.
Author Information
Tung Nguyen (University of California, Los Angeles)
Qinqing Zheng (Facebook AI Research)
Aditya Grover (University of California, Los Angeles)
More from the Same Authors
-
2022 : Conditioned Spatial Downscaling of Climate Variables »
Alex Hung · Evan Becker · Ted Zadouri · Aditya Grover -
2022 : Short-range forecasts of global precipitation using deep learning-augmented numerical weather prediction »
Manmeet Singh · Vaisakh SB · Nachiketa Acharya · Aditya Grover · Suryachandra A. Rao · Bipin Kumar · Zong-Liang Yang · Dev Niyogi -
2022 : Machine Learning for Predicting Climate Extremes »
Hritik Bansal · Shashank Goel · Tung Nguyen · Aditya Grover -
2022 : Pareto-Efficient Decision Agents for Offline Multi-Objective Reinforcement Learning »
Baiting Zhu · Meihua Dang · Aditya Grover -
2022 : Generative Pretraining for Black-Box Optimization »
Siddarth Krishnamoorthy · Satvik Mashkaria · Aditya Grover -
2022 : Pareto-Efficient Decision Agents for Offline Multi-Objective Reinforcement Learning »
Baiting Zhu · Meihua Dang · Aditya Grover -
2023 Poster: Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models »
Siyan Zhao · Aditya Grover -
2023 Poster: Synthetic Pretraining for Few-shot Black-Box Optimization »
Tung Nguyen · Sudhanshu Agrawal · Aditya Grover -
2023 Poster: ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling »
Tung Nguyen · Jason Jewik · Hritik Bansal · Prakhar Sharma · Aditya Grover -
2022 : Machine Learning for Predicting Climate Extremes »
Hritik Bansal · Shashank Goel · Tung Nguyen · Aditya Grover -
2022 Poster: Masked Autoencoding for Scalable and Generalizable Decision Making »
Fangchen Liu · Hao Liu · Aditya Grover · Pieter Abbeel -
2022 Poster: CyCLIP: Cyclic Contrastive Language-Image Pretraining »
Shashank Goel · Hritik Bansal · Sumit Bhatia · Ryan Rossi · Vishwa Vinay · Aditya Grover