Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning

Long-Term Credit Assignment via Model-based Temporal Shortcuts

Michel Ma · Pierluca D'Oro · Yoshua Bengio · Pierre-Luc Bacon


This work explores the question of long-term credit assignment in reinforcement learning. Assigning credit over long distances has historically been difficult in both reinforcement learning and recurrent neural networks, where discounting or gradient truncation respectively are often necessary for feasibility, but limit the model's ability to reason over longer time scales. We propose LVGTS, a novel model-based algorithm that bridges the gap between the two fields. By using backpropagation through a latent model and temporal shortcuts to directly propagate gradients, LVGTS assigns credit from the future to the possibly distant past regardless of the use of discounting or gradient truncation. We show, on simple but carefully-designed problems, that our approach is able to perform effective credit assignment even in the presence of distractions.

Chat is not available.