Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning

BLAST: Latent Dynamics Models from Bootstrapping

Keiran Paster · Lev McKinney · Sheila McIlraith · Jimmy Ba


State-of-the-art world models such as DreamerV2 have significantly improved the capabilities of model-based reinforcement learning. However, these approaches typically rely on reconstruction losses to shape their latent representations of the environment, which are known to fail in environments with high fidelity visual observations. When learning latent dynamics models without reconstruction loss using only the signal present in the reward signal, the performance of these methods also drops dramatically. We present a simple modification to DreamerV2 without reconstruction loss inspired by the recent self-supervised learning method Bootstrap Your Own Latent. The combination of adding a stop-gradient to the posterior, using a powerful auto-regressive model for the prior, and using a slowly updating target encoder, which we call BLAST, allows the world model to learn from signals present in both the reward and observations, improving efficiency on our tested environment as well as being significantly more robust to visual distractors.

Chat is not available.