Timezone: »

BLaDE: Robust Exploration via Diffusion Models
Bilal Piot · Zhaohan Guo · Shantanu Thakoor · Mohammad Gheshlaghi Azar
Event URL: https://openreview.net/forum?id=cLijWz05L2b »

We present Bootstrap your own Latents with Diffusion models for Exploration (BLaDE), a general approach for curiosity-driven exploration in complex, partially-observable and stochastic environments. BLaDE is a natural extension of Bootstrap Your Own Latents for Exploration (BYOL-Explore) which is a multi-step prediction-error method at the latent level that learns a world representation, the world dynamics, and provides an intrinsic-reward all-together by optimizing a single prediction loss with no additional auxiliary objective. Contrary to BYOL-Explore that predicts future latents from past latents and future open-loop actions, BLaDE predicts, via a diffusion model, future latents from past observations, future open-loop actions and a noisy version of future latents. Leaking information about future latents allows to obtain an intrinsic reward that does not depend on the variance of the distribution of future latents which makes the method agnostic to stochastic traps. Our experiments on different noisy versions of Montezuma's Revenge show that BLaDE handles stochasticity better than Random Network Distillation, Intrinsic Curiosity Module and BYOL-Explore without degrading the performance of BYOL-Explore in the non-noisy and fairly deterministic

Author Information

Bilal Piot (DeepMind)
Zhaohan Guo (DeepMind)
Shantanu Thakoor (Google)
Mohammad Gheshlaghi Azar (DeepMind)

More from the Same Authors