Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning

No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

Risto Vuorio · Jacob Beck · Greg Farquhar · Jakob Foerster · Shimon Whiteson


Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms. Estimation of meta-gradients is central to the performance of these meta-algorithms, and has been studied in the setting of MAML-style short-horizon meta-RL problems. In this context, prior work has investigated the estimation of the Hessian of the RL objective, as well as tackling the problem of credit assignment to pre-adaptation behavior by making a sampling correction. However, we show that Hessian estimation, implemented for example by DiCE and its variants, always add bias and can also add variance to meta-gradient estimation. DiCE-like approaches are therefore unlikely to lie on Pareto frontier of the bias-variance tradeoff and should not be pursued in the context of meta-gradients for RL. Meanwhile, the sampling correction has not been studied in the important long-horizon setting, where the inner optimization trajectories must be truncated for computational tractability. We study the bias and variance tradeoff induced by truncated backpropagation in combination with a weighted sampling correction. While prior work has implicitly chosen points in this bias-variance space, we disentangle the sources of bias and variance and present an empirical study which relates existing estimators to each other.

Chat is not available.