Timezone: »

 
StarCraft II Unplugged: Large Scale Offline Reinforcement Learning
Michael Mathieu · Sherjil Ozair · Srivatsan Srinivasan · Caglar Gulcehre · Shangtong Zhang · Ray Jiang · Tom Paine · Konrad Żołna · Julian Schrittwieser · David Choi · Petko I Georgiev · Daniel Toyama · Roman Ring · Igor Babuschkin · Timo Ewalds · · Aaron van den Oord · Wojciech Czarnecki · Nando de Freitas · Oriol Vinyals
Event URL: https://openreview.net/forum?id=Np8Pumfoty »

StarCraft II is one of the most challenging reinforcement learning (RL) environments; it is partially observable, stochastic, and multi-agent, and mastering StarCraft II requires strategic planning over long-time horizons with real-time low-level execution. It also has an active human competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because a massive dataset of millions of StarCraft II games played by human players has been released by Blizzard. This paper leverages that and establishes a benchmark, which we call StarCraft II Unplugged, that introduces unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard’s release), tools standardising an API for ML methods, and an evaluation protocol. We also present baseline agents, including behaviour cloning, and offline variants of V-trace actor-critic and MuZero. We find that the variants of those algorithms with behaviour value estimation and single step policy improvement work best and exceed 90% win rate against previously published AlphaStar behaviour cloning agents.

Author Information

Michael Mathieu (DeepMind)
Sherjil Ozair (DeepMind)
Srivatsan Srinivasan (Google)
Caglar Gulcehre (Deepmind)
Shangtong Zhang (University of Oxford)
Ray Jiang (DeepMind)
Tom Paine
Konrad Żołna (DeepMind)
Julian Schrittwieser (DeepMind)
David Choi (DeepMind)
Petko I Georgiev (Google DeepMind)
Daniel Toyama (DeepMind)
Roman Ring (University of Tartu)
Igor Babuschkin (DeepMind)
Timo Ewalds (Deepmind)
Aaron van den Oord (Google Deepmind)
Wojciech Czarnecki (DeepMind)
Nando de Freitas (UBC)
Oriol Vinyals (Google DeepMind)

More from the Same Authors