Timezone: »
In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many diverse strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., diversity measure and computation framework. First, we find that with existing diversity measures, visually indistinguishable policies can still yield high diversity scores. To accurately capture the behavioral difference, we propose to incorporate the state-space distance information into the diversity measure. In addition, we examine two common computation frameworks for this problem, i.e., population-based training (PBT) and iterative learning (ITR). We show that although PBT is the precise problem formulation, ITR can achieve comparable diversity scores with higher computation efficiency, leading to improved solution quality in practice. Based on our analysis, we further combine ITR with two tractable realizations of the state-distance-based diversity measures and develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties. We empirically examine SIPO across three domains from robot locomotion to multi-agent games. In all of our testing environments, SIPO consistently produces strategically diverse and human-interpretable policies that cannot be discovered by existing baselines.
Author Information
Wei Fu (Institute for Interdisciplinary Information Sciences, Tsinghua University, Tsinghua University)
Weihua Du (IIIS, Tsinghua University)
Jingwei Li (Tsinghua University, Tsinghua University)
Sunli Chen (Tsinghua University)
Jingzhao Zhang (Tsinghua University, Tsinghua University)
YI WU (Shanghai Qi Zhi Institute & Tsinghua University)
More from the Same Authors
-
2021 : Learning Design and Construction with Varying-Sized Materials via Prioritized Memory Resets »
Yunfei Li · Lei Li · YI WU -
2021 : Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization »
Zihan Zhou · Wei Fu · Bingliang Zhang · Yi Wu -
2022 Poster: Grounded Reinforcement Learning: Learning to Win the Game under Human Commands »
Shusheng Xu · Huaijie Wang · YI WU -
2022 Poster: Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning »
Zhecheng Yuan · Zhengrong Xue · Bo Yuan · Xueqian Wang · YI WU · Yang Gao · Huazhe Xu -
2022 : Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization »
Runlong Zhou · Yuandong Tian · YI WU · Simon Du -
2022 : Online Policy Optimization for Robust MDP »
Jing Dong · Jingwei Li · Baoxiang Wang · Jingzhao Zhang -
2023 : Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm »
Peiyuan Zhang · Jingzhao Zhang · Suvrit Sra -
2023 : Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing »
Ying Yuan · Haichuan Che · Yuzhe Qin · Binghao Huang · Zhao-Heng Yin · YI WU · Xiaolong Wang -
2023 : A Quadratic Synchronization Rule for Distributed Deep Learning »
Xinran Gu · Kaifeng Lyu · Sanjeev Arora · Jingzhao Zhang · Longbo Huang -
2023 : Building Cooperative Embodied Agents Modularly with Large Language Models »
Hongxin Zhang · Weihua Du · Jiaming Shan · Qinhong Zhou · Yilun Du · Josh Tenenbaum · Tianmin Shu · Chuang Gan -
2023 : Building Cooperative Embodied Agents Modularly with Large Language Models »
Hongxin Zhang · Weihua Du · Jiaming Shan · Qinhong Zhou · Yilun Du · Josh Tenenbaum · Tianmin Shu · Chuang Gan -
2023 Poster: On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective »
Zeke Xie · Zhiqiang Xu · Jingzhao Zhang · Issei Sato · Masashi Sugiyama -
2023 Poster: Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions »
Xiang Cheng · Bohan Wang · Jingzhao Zhang · Yusong Zhu -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning »
Zhecheng Yuan · Zhengrong Xue · Bo Yuan · Xueqian Wang · YI WU · Yang Gao · Huazhe Xu -
2022 : Online Policy Optimization for Robust MDP »
Jing Dong · Jingwei Li · Baoxiang Wang · Jingzhao Zhang -
2022 Poster: Efficient Sampling on Riemannian Manifolds via Langevin MCMC »
Xiang Cheng · Jingzhao Zhang · Suvrit Sra -
2022 Poster: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games »
Chao Yu · Akash Velu · Eugene Vinitsky · Jiaxuan Gao · Yu Wang · Alexandre Bayen · YI WU -
2020 Poster: Multi-Task Reinforcement Learning with Soft Modularization »
Ruihan Yang · Huazhe Xu · YI WU · Xiaolong Wang -
2018 Poster: Meta-Learning MCMC Proposals »
Tongzhou Wang · YI WU · Dave Moore · Stuart Russell -
2017 Poster: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments »
Ryan Lowe · YI WU · Aviv Tamar · Jean Harb · OpenAI Pieter Abbeel · Igor Mordatch -
2016 Poster: Value Iteration Networks »
Aviv Tamar · Sergey Levine · Pieter Abbeel · YI WU · Garrett Thomas -
2016 Oral: Value Iteration Networks »
Aviv Tamar · Sergey Levine · Pieter Abbeel · YI WU · Garrett Thomas -
2014 Workshop: 3rd NIPS Workshop on Probabilistic Programming »
Daniel Roy · Josh Tenenbaum · Thomas Dietterich · Stuart J Russell · YI WU · Ulrik R Beierholm · Alp Kucukelbir · Zenna Tavares · Yura Perov · Daniel Lee · Brian Ruttenberg · Sameer Singh · Michael Hughes · Marco Gaboardi · Alexey Radul · Vikash Mansinghka · Frank Wood · Sebastian Riedel · Prakash Panangaden