Timezone: »

Concept-based Understanding of Emergent Multi-Agent Behavior
Niko Grupen · Shayegan Omidshafiei · Natasha Jaques · Been Kim
Event URL: https://openreview.net/forum?id=zt5JpGQ8WhH »

This work studies concept-based interpretability in the context of multi-agent learning. Unlike supervised learning, where there have been efforts to understand a model's decisions, multi-agent interpretability remains under-investigated. This is in part due to the increased complexity of the multi-agent setting---interpreting the decisions of multiple agents over time is combinatorially more complex than understanding individual, static decisisons---but is also a reflection of the limited availability of tools for understanding multi-agent behavior. Interactions between agents, and coordination generally, remain difficult to gauge in MARL. In this work, we propose Concept Bottleneck Policies (CBPs) as a method for learning intrinsically interpretable, concept-based policies with MARL. We demonstrate that, by conditioning each agent's action on a set of human-understandable concepts, our method enables post-hoc behavioral analysis via concept intervention that is infeasible with standard policy architectures. Experiments show that concept interventions over CBPs reliably detect when agents have learned to coordinate with each other in environments that do not demand coordination, and detect those environments in which coordination is required. Moreover, we find evidence that CBPs can detect coordination failures (such as lazy agents) and expose the low-level inter-agent information that underpins emergent coordination. Finally, we demonstrate that our approach matches the performance of standard, non-concept-based policies; thereby achieving interpretability without sacrificing performance.

Author Information

Niko Grupen (Cornell University)
Shayegan Omidshafiei (Google)
Natasha Jaques (Google Brain, UC Berkeley)

Natasha Jaques holds a joint position as a Research Scientist at Google Brain and Postdoctoral Fellow at UC Berkeley. Her research focuses on Social Reinforcement Learning in multi-agent and human-AI interactions. Natasha completed her PhD at MIT, where her thesis received the Outstanding PhD Dissertation Award from the Association for the Advancement of Affective Computing. Her work has also received Best Demo at NeurIPS, an honourable mention for Best Paper at ICML, Best of Collection in the IEEE Transactions on Affective Computing, and Best Paper at the NeurIPS workshops on ML for Healthcare and Cooperative AI. She has interned at DeepMind, Google Brain, and was an OpenAI Scholars mentor. Her work has been featured in Science Magazine, Quartz, MIT Technology Review, Boston Magazine, and on CBC radio. Natasha earned her Masters degree from the University of British Columbia, and undergraduate degrees in Computer Science and Psychology from the University of Regina.

Been Kim (Google Brain)

More from the Same Authors