Timezone: »
We consider the problem of making AI agents that collaborate well with humans in partially observable fully cooperative environments given datasets of human behavior. Inspired by piKL, a human-data-regularized search method that improves upon a behavioral cloning policy without diverging far away from it, we develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark. We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels. Then, we integrate the policy regularization idea into reinforcement learning to train a human-like best response to the human model. Finally, we apply regularized search on top of the best response policy at test time to handle out-of-distribution challenges when playing with humans. We evaluate our method in two large scale experiments with humans. First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams. Second, we show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.
Author Information
Hengyuan Hu (Stanford University)
David Wu (Meta)
Adam Lerer (Facebook AI Research)
Jakob Foerster (University of Oxford)
Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.
Noam Brown (FAIR)
More from the Same Authors
-
2021 : Grounding Aleatoric Uncertainty in Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2021 : No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients »
Risto Vuorio · Jacob Beck · Greg Farquhar · Jakob Foerster · Shimon Whiteson -
2021 : That Escalated Quickly: Compounding Complexity by Editing Levels at the Frontier of Agent Capabilities »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 : A Fine-Tuning Approach to Belief State Modeling »
Samuel Sokota · Hengyuan Hu · David Wu · Jakob Foerster · Noam Brown -
2021 : Generalized Belief Learning in Multi-Agent Settings »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 : Seq2MSA: A Language Model for Protein Sequence Diversification »
Pascal Sturmfels · Roshan Rao · Robert Verkuil · Zeming Lin · Tom Sercu · Adam Lerer · Alex Rives -
2022 : Adversarial Cheap Talk »
Chris Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games »
Samuel Sokota · Ryan D'Orazio · J. Zico Kolter · Nicolas Loizou · Marc Lanctot · Ioannis Mitliagkas · Noam Brown · Christian Kroer -
2022 : Adversarial Cheap Talk »
Chris Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning »
Anton Bakhtin · David Wu · Adam Lerer · Jonathan Gray · Athul Jacob · Gabriele Farina · Alexander Miller · Noam Brown -
2022 : MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning »
Mikayel Samvelyan · Akbir Khan · Michael Dennis · Minqi Jiang · Jack Parker-Holder · Jakob Foerster · Roberta Raileanu · Tim Rocktäschel -
2022 : Converging to Unexploitable Policies in Continuous Control Adversarial Games »
Maxwell Goldstein · Noam Brown -
2023 Workshop: Socially Responsible Language Modelling Research (SoLaR) »
Usman Anwar · David Krueger · Samuel Bowman · Jakob Foerster · Su Lin Blodgett · Roberta Raileanu · Alan Chan · Katherine Lee · Laura Ruis · Robert Kirk · Yawen Duan · Xin Chen · Kawin Ethayarajh -
2022 : Jakob Foerster »
Jakob Foerster -
2022 : Seq2MSA: A Language Model for Protein Sequence Diversification »
Pascal Sturmfels · Roshan Rao · Robert Verkuil · Zeming Lin · Tom Sercu · Adam Lerer · Alex Rives -
2022 Poster: Proximal Learning With Opponent-Learning Awareness »
Stephen Zhao · Chris Lu · Roger Grosse · Jakob Foerster -
2022 Poster: Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world »
Eugene Vinitsky · Nathan Lichtlé · Xiaomeng Yang · Brandon Amos · Jakob Foerster -
2022 Poster: Grounding Aleatoric Uncertainty for Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Küttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2022 Poster: Off-Team Learning »
Brandon Cui · Hengyuan Hu · Andrei Lupu · Samuel Sokota · Jakob Foerster -
2022 Poster: Self-Explaining Deviations for Coordination »
Hengyuan Hu · Samuel Sokota · David Wu · Anton Bakhtin · Andrei Lupu · Brandon Cui · Jakob Foerster -
2022 Poster: Discovered Policy Optimisation »
Chris Lu · Jakub Kuba · Alistair Letcher · Luke Metz · Christian Schroeder de Witt · Jakob Foerster -
2022 Poster: Influencing Long-Term Behavior in Multiagent Reinforcement Learning »
Dong-Ki Kim · Matthew Riemer · Miao Liu · Jakob Foerster · Michael Everett · Chuangchuang Sun · Gerald Tesauro · Jonathan How -
2022 Poster: Equivariant Networks for Zero-Shot Coordination »
Darius Muglich · Christian Schroeder de Witt · Elise van der Pol · Shimon Whiteson · Jakob Foerster -
2021 Workshop: Cooperative AI »
Natasha Jaques · Edward Hughes · Jakob Foerster · Noam Brown · Kalesha Bullard · Charlotte Smith -
2021 Poster: Replay-Guided Adversarial Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 Poster: No-Press Diplomacy from Scratch »
Anton Bakhtin · David Wu · Adam Lerer · Noam Brown -
2021 Poster: K-level Reasoning for Zero-Shot Coordination in Hanabi »
Brandon Cui · Hengyuan Hu · Luis Pineda · Jakob Foerster -
2021 Poster: Neural Pseudo-Label Optimism for the Bank Loan Problem »
Aldo Pacchiano · Shaun Singh · Edward Chou · Alex Berg · Jakob Foerster -
2020 : Exploring generative atomic models in cryo-EM reconstruction »
Ellen Zhong · Adam Lerer · · Bonnie Berger -
2020 Workshop: Talking to Strangers: Zero-Shot Emergent Communication »
Marie Ossenkopf · Angelos Filos · Abhinav Gupta · Michael Noukhovitch · Angeliki Lazaridou · Jakob Foerster · Kalesha Bullard · Rahma Chaabouni · Eugene Kharitonov · Roberto Dessì -
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2020 Poster: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games »
Noam Brown · Anton Bakhtin · Adam Lerer · Qucheng Gong -
2019 : Contributed Talk - 3 »
Adam Lerer -
2019 : Poster Session »
Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn -
2019 : Extended Poster Session »
Travis LaCroix · Marie Ossenkopf · Mina Lee · Nicole Fitzgerald · Daniela Mihai · Jonathon Hare · Ali Zaidi · Alexander Cowen-Rivers · Alana Marzoev · Eugene Kharitonov · Luyao Yuan · Tomasz Korbak · Paul Pu Liang · Yi Ren · Roberto Dessì · Peter Potash · Shangmin Guo · Tatsunori Hashimoto · Percy Liang · Julian Zubek · Zipeng Fu · Song-Chun Zhu · Adam Lerer -
2019 Workshop: Emergent Communication: Towards Natural Language »
Abhinav Gupta · Michael Noukhovitch · Cinjon Resnick · Natasha Jaques · Angelos Filos · Marie Ossenkopf · Angeliki Lazaridou · Jakob Foerster · Ryan Lowe · Douwe Kiela · Kyunghyun Cho -
2019 Poster: PyTorch: An Imperative Style, High-Performance Deep Learning Library »
Adam Paszke · Sam Gross · Francisco Massa · Adam Lerer · James Bradbury · Gregory Chanan · Trevor Killeen · Zeming Lin · Natalia Gimelshein · Luca Antiga · Alban Desmaison · Andreas Kopf · Edward Yang · Zachary DeVito · Martin Raison · Alykhan Tejani · Sasank Chilamkurthy · Benoit Steiner · Lu Fang · Junjie Bai · Soumith Chintala -
2019 Poster: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning »
Gregory Farquhar · Shimon Whiteson · Jakob Foerster -
2019 Poster: Multi-Agent Common Knowledge Reinforcement Learning »
Christian Schroeder de Witt · Jakob Foerster · Gregory Farquhar · Philip Torr · Wendelin Boehmer · Shimon Whiteson -
2019 Poster: Robust Multi-agent Counterfactual Prediction »
Alexander Peysakhovich · Christian Kroer · Adam Lerer -
2019 Poster: Hierarchical Decision Making by Generating and Following Natural Language Instructions »
Hengyuan Hu · Denis Yarats · Qucheng Gong · Yuandong Tian · Mike Lewis -
2018 Workshop: Emergent Communication Workshop »
Jakob Foerster · Angeliki Lazaridou · Ryan Lowe · Igor Mordatch · Douwe Kiela · Kyunghyun Cho -
2018 : Poster Session 1 + Coffee »
Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang -
2017 Workshop: Emergent Communication Workshop »
Jakob Foerster · Igor Mordatch · Angeliki Lazaridou · Kyunghyun Cho · Douwe Kiela · Pieter Abbeel -
2017 Demonstration: Libratus: Beating Top Humans in No-Limit Poker »
Noam Brown · Tuomas Sandholm -
2017 Poster: Safe and Nested Subgame Solving for Imperfect-Information Games »
Noam Brown · Tuomas Sandholm -
2017 Oral: Safe and Nested Subgame Solving for Imperfect-Information Games »
Noam Brown · Tuomas Sandholm -
2016 Workshop: Intuitive Physics »
Adam Lerer · Jiajun Wu · Josh Tenenbaum · Emmanuel Dupoux · Rob Fergus -
2016 Poster: Learning to Communicate with Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Yannis Assael · Nando de Freitas · Shimon Whiteson