Timezone: »
In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint action space that grows exponentially with the number of agents. While the tractability of independent agent-wise exploration is appealing, this approach fails on tasks that require elaborate group strategies. We argue that coordinating the agents' policies can guide their exploration and we investigate techniques to promote such an inductive bias. We propose two policy regularization methods: TeamReg, which is based on inter-agent action predictability and CoachReg that relies on synchronized behavior selection. We evaluate each approach on four challenging continuous control tasks with sparse rewards that require varying levels of coordination as well as on the discrete action Google Research Football environment. Our experiments show improved performance across many cooperative multi-agent problems. Finally, we analyze the effects of our proposed methods on the policies that our agents learn and show that our methods successfully enforce the qualities that we propose as proxies for coordinated behaviors.
Author Information
Julien Roy (Mila)
Paul Barde (Quebec AI institute - Mila, McGill)
Félix Harvey (Polytechnique Montréal)
Derek Nowrouzezahrai (McGill University)
Chris Pal (MILA, Polytechnique Montréal, Element AI)
More from the Same Authors
-
2022 Workshop: LaReL: Language and Reinforcement Learning »
Laetitia Teodorescu · Laura Ruis · Tristan Karch · Cédric Colas · Paul Barde · Jelena Luketina · Athul Jacob · Pratyusha Sharma · Edward Grefenstette · Jacob Andreas · Marc-Alexandre Côté -
2022 Poster: Attention-based Neural Cellular Automata »
Mattie Tesfaldet · Derek Nowrouzezahrai · Chris Pal -
2020 Workshop: Differentiable computer vision, graphics, and physics in machine learning »
Krishna Murthy Jatavallabhula · Kelsey Allen · Victoria Dean · Johanna Hansen · Shuran Song · Florian Shkurti · Liam Paull · Derek Nowrouzezahrai · Josh Tenenbaum -
2020 Poster: Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization »
Paul Barde · Julien Roy · Wonseok Jeon · Joelle Pineau · Chris Pal · Derek Nowrouzezahrai -
2020 Spotlight: Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization »
Paul Barde · Julien Roy · Wonseok Jeon · Joelle Pineau · Chris Pal · Derek Nowrouzezahrai -
2019 Poster: Neural Multisensory Scene Inference »
Jae Hyun Lim · Pedro O. Pinheiro · Negar Rostamzadeh · Chris Pal · Sungjin Ahn -
2019 Poster: On Adversarial Mixup Resynthesis »
Christopher Beckham · Sina Honari · Alex Lamb · Vikas Verma · Farnoosh Ghadiri · R Devon Hjelm · Yoshua Bengio · Chris Pal -
2018 Poster: Towards Deep Conversational Recommendations »
Raymond Li · Samira Ebrahimi Kahou · Hannes Schulz · Vincent Michalski · Laurent Charlin · Chris Pal -
2018 Poster: Unsupervised Depth Estimation, 3D Face Rotation and Replacement »
Joel Ruben Antony Moniz · Christopher Beckham · Simon Rajotte · Sina Honari · Chris Pal -
2018 Poster: Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding »
Nan Rosemary Ke · Anirudh Goyal · Olexa Bilaniuk · Jonathan Binas · Michael Mozer · Chris Pal · Yoshua Bengio -
2018 Spotlight: Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding »
Nan Rosemary Ke · Anirudh Goyal · Olexa Bilaniuk · Jonathan Binas · Michael Mozer · Chris Pal · Yoshua Bengio -
2018 Poster: Towards Text Generation with Adversarially Learned Neural Outlines »
Sandeep Subramanian · Sai Rajeswar Mudumba · Alessandro Sordoni · Adam Trischler · Aaron Courville · Chris Pal -
2017 Poster: ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events »
Evan Racah · Christopher Beckham · Tegan Maharaj · Samira Ebrahimi Kahou · Mr. Prabhat · Chris Pal