Timezone: »
The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.
Author Information
Noam Brown (Facebook AI Research)
Anton Bakhtin (Facebook AI Research)
Adam Lerer (Facebook AI Research)
Qucheng Gong (Facebook AI Research)
More from the Same Authors
-
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2020 Poster: Joint Policy Search for Multi-agent Collaboration with Imperfect Information »
Yuandong Tian · Qucheng Gong · Yu Jiang -
2019 Poster: PyTorch: An Imperative Style, High-Performance Deep Learning Library »
Adam Paszke · Sam Gross · Francisco Massa · Adam Lerer · James Bradbury · Gregory Chanan · Trevor Killeen · Zeming Lin · Natalia Gimelshein · Luca Antiga · Alban Desmaison · Andreas Kopf · Edward Yang · Zachary DeVito · Martin Raison · Alykhan Tejani · Sasank Chilamkurthy · Benoit Steiner · Lu Fang · Junjie Bai · Soumith Chintala -
2019 Poster: PHYRE: A New Benchmark for Physical Reasoning »
Anton Bakhtin · Laurens van der Maaten · Justin Johnson · Laura Gustafson · Ross Girshick -
2019 Poster: Robust Multi-agent Counterfactual Prediction »
Alexander Peysakhovich · Christian Kroer · Adam Lerer -
2018 Poster: Depth-Limited Solving for Imperfect-Information Games »
Noam Brown · Tuomas Sandholm · Brandon Amos -
2017 Demonstration: Libratus: Beating Top Humans in No-Limit Poker »
Noam Brown · Tuomas Sandholm -
2017 Poster: Safe and Nested Subgame Solving for Imperfect-Information Games »
Noam Brown · Tuomas Sandholm -
2017 Oral: Safe and Nested Subgame Solving for Imperfect-Information Games »
Noam Brown · Tuomas Sandholm -
2016 Workshop: Intuitive Physics »
Adam Lerer · Jiajun Wu · Josh Tenenbaum · Emmanuel Dupoux · Rob Fergus -
2015 Poster: Regret-Based Pruning in Extensive-Form Games »
Noam Brown · Tuomas Sandholm -
2015 Demonstration: Claudico: The World's Strongest No-Limit Texas Hold'em Poker AI »
Noam Brown · Tuomas Sandholm