Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning

Invited Talk: Dale Schuurmans - Understanding Deep Value Estimation

Dale Schuurmans


Estimating long term returns given short data trajectories remains a core technique in deep reinforcement learning. Remarkably, deep reinforcement learning in-the-wild often succeeds even when theoretical assumptions needed to guarantee good performance are neglected. I will discuss two recent investigations that shed some light on this phenomenon. First, I will discuss some findings about the implicit biases embodied by different value estimation algorithms, and why apparently unsound methods can still exhibit generalization advantages. Then I will discuss some recent ideas about how the risk of self-delusion in value estimation can be reduced through temporal grounding. These observations do not close the investigation, but do offer alternative prospects for improving deep value estimation in practice.