Skip to yearly menu bar Skip to main content

Workshop: 3rd Offline Reinforcement Learning Workshop: Offline RL as a "Launchpad"

Raisin: Residual Algorithms for Versatile Offline Reinforcement Learning

Braham Snyder · Yuke Zhu

Abstract: The residual gradient algorithm (RG), gradient descent of the Mean Squared Bellman Error, brings robust convergence guarantees to bootstrapped value estimation. Meanwhile, the far more common semi-gradient algorithm (SG) suffers from well-known instabilities and divergence. Unfortunately, RG often converges slowly in practice. Baird (1995) proposed residual algorithms (RA), weighted averaging of RG and SG, to combine RG's robust convergence and SG's speed. RA works moderately well in the online setting. We find, however, that RA works disproportionately well in the offline setting. Concretely, we find that merely adding a variable residual component to SAC increases its score on D4RL gym tasks by a median factor of 54. We further show that using the minimum of ten critics lets our algorithm match SAC-$N$'s state-of-the-art returns using 50$\times$ less compute and no additional hyperparameters. In contrast, TD3+BC with the same minimum-of-ten-critics trick does not match SAC-$N$'s returns on a handful of environments.

Chat is not available.