Timezone: »

 
ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning
Tung Nguyen · Qinqing Zheng · Aditya Grover
Event URL: https://openreview.net/forum?id=Ed4X0HxYFfH »

The goal of offline reinforcement learning (RL) is to learn near-optimal policies from static logged datasets, thus sidestepping expensive online interactions. Behavioral cloning (BC) provides a straightforward solution to offline RL by mimicking offline trajectories via supervised learning. Recent advances~\cite{chen2021decision, janner2021offline, emmons2021rvs} have shown that by conditioning on desired future returns, BC can perform competitively to their value-based counterparts, while enjoying much more simplicity and training stability. However, the distribution of returns in the offline dataset can be arbitrarily skewed and suboptimal, which poses a unique challenge for conditioning BC on expert returns at test time. We propose ConserWeightive Behavioral Cloning (\name), a simple and effective method for improving the performance of conditional BC for offline RL with two key components: trajectory weighting and conservative regularization. Trajectory weighting addresses the bias-variance tradeoff in conditional BC and provides a principled mechanism to learn from both low return trajectories (typically plentiful) and high return trajectories (typically few). Further, we analyze the notion of conservatism in existing BC methods, and propose a novel conservative regularizer that explicitly encourages the policy to stay close to the data distribution. The regularizer helps achieve more reliable performance, and removes the need for ad-hoc tuning of the conditioning value during evaluation. We instantiate \name{} in the context of Reinforcement Learning via Supervised Learning (RvS)~\cite{emmons2021rvs} and Decision Transformer (DT)~\citep{chen2021decision}, and empirically show that it significantly boosts the performance and stability of prior methods on various offline RL benchmarks.

Author Information

Tung Nguyen (University of California, Los Angeles)
Qinqing Zheng (Facebook AI Research)
Aditya Grover (University of California, Los Angeles)

More from the Same Authors