Timezone: »

Grounding Aleatoric Uncertainty in Unsupervised Environment Design
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster
Event URL: https://openreview.net/forum?id=o8_QHMYOfu »

In reinforcement learning (RL), adaptive curricula have proven highly effective for learning policies that generalize well under a wide variety of changes to the environment. Recently, the framework of Unsupervised Environment Design (UED) generalized notions of curricula for RL in terms of generating entire environments, leading to the development of new methods with robust minimax-regret properties. However, in partially-observable or stochastic settings (those featuring aleatoric uncertainty), optimal policies may depend on the ground-truth distribution over the aleatoric features of the environment. Such settings are potentially problematic for curriculum learning, which necessarily shifts the environment distribution used during training with respect to the fixed ground-truth distribution in the intended deployment environment. We formalize this phenomenon as curriculum-induced covariate shift, and describe how, when the distribution shift occurs over such aleatoric environment parameters, it can lead to learning suboptimal policies. We then propose a method which, given black-box access to a simulator, corrects this resultant bias by aligning the advantage estimates to the ground-truth distribution over aleatoric parameters. This approach leads to a minimax-regret UED method, SAMPLR, with Bayes-optimal guarantees.

Author Information

Minqi Jiang (UCL & FAIR)
Michael Dennis (University of California Berkeley)

Michael Dennis is a 5th year grad student at the Center for Human-Compatible AI. With a background in theoretical computer science, he is working to close the gap between decision theoretic and game theoretic recommendations and the current state of the art approaches to robust RL and multi-agent RL. The overall aim of this work is to ensure that our systems behave in a way that is robustly beneficial. In the single agent setting, this means making decisions and managing risk in the way the designer intends. In the multi-agent setting, this means ensuring that the concerns of the designer and those of others in the society are fairly and justly negotiated to the benefit of all involved.

Jack Parker-Holder (University of Oxford)
Andrei Lupu (McGill University)
Heinrich Kuttler (FAIR)
Edward Grefenstette (Facebook AI Research & University College London)
Tim Rocktäschel (Facebook AI Research)
Jakob Foerster (University of Oxford)

Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.

More from the Same Authors