Timezone: »
Poster
A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
Bo Liu · Xidong Feng · Jie Ren · Luo Mai · Rui Zhu · Haifeng Zhang · Jun Wang · Yaodong Yang
Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations. In this paper, we develop a unified framework that describes variations of GMRL algorithms and points out that existing stochastic meta-gradient estimators adopted by GMRL are actually \textbf{biased}. Such meta-gradient bias comes from two sources: 1) the compositional bias incurred by the two-level problem structure, which has an upper bound of $\mathcal{O}\big(K\alpha^{K}\hat{\sigma}_{\text{In}}|\tau|^{-0.5}\big)$ \emph{w.r.t.} inner-loop update step $K$, learning rate $\alpha$, estimate variance $\hat{\sigma}^{2}_{\text{In}}$ and sample size $|\tau|$, and 2) the multi-step Hessian estimation bias $\hat{\Delta}_{H}$ due to the use of autodiff, which has a polynomial impact $\mathcal{O}\big((K-1)(\hat{\Delta}_{H})^{K-1}\big)$ on the meta-gradient bias. We study tabular MDPs empirically and offer quantitative evidence that testifies our theoretical findings on existing stochastic meta-gradient estimators. Furthermore, we conduct experiments on Iterated Prisoner's Dilemma and Atari games to show how other methods such as off-policy learning and low-bias estimator can help fix the gradient bias for GMRL algorithms in general.
Author Information
Bo Liu (Peking University)
Xidong Feng (University College London)
Jie Ren (University of Edinburgh, University of Edinburgh)
Luo Mai (University of Edinburgh, University of Edinburgh)
Rui Zhu (DeepMind)
Haifeng Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Jun Wang (UCL)
Yaodong Yang (AIG)
More from the Same Authors
-
2022 Poster: Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning »
Runze Liu · Fengshuo Bai · Yali Du · Yaodong Yang -
2022 Poster: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Poster: M2N: Mesh Movement Networks for PDE Solvers »
Wenbin Song · Mingrui Zhang · Joseph G Wallwork · Junpeng Gao · Zheng Tian · Fanglei Sun · Matthew Piggott · Junqing Chen · Zuoqiang Shi · Xiang Chen · Jun Wang -
2022 Poster: Constrained Update Projection Approach to Safe Policy Optimization »
Long Yang · Jiaming Ji · Juntao Dai · Linrui Zhang · Binbin Zhou · Pengfei Li · Yaodong Yang · Gang Pan -
2022 Poster: A Unified Diversity Measure for Multiagent Reinforcement Learning »
Zongkai Liu · Chao Yu · Yaodong Yang · peng sun · Zifan Wu · Yuan Li -
2022 Poster: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 : TorchOpt: An Efficient library for Differentiable Optimization »
Jie Ren · Xidong Feng · Bo Liu · Xuehai Pan · Yao Fu · Luo Mai · Yaodong Yang -
2022 : Contextual Transformer for Offline Meta Reinforcement Learning »
Runji Lin · Ye Li · Xidong Feng · Zhaowei Zhang · XIAN HONG WU FUNG · Haifeng Zhang · Jun Wang · Yali Du · Yaodong Yang -
2023 : MultiReAct: Multimodal Tools Augmented Reasoning-Acting Traces for Embodied Agent Planning »
Zhouliang Yu · Jie Fu · Yao Mu · Chenguang Wang · Lin Shao · Yaodong Yang -
2023 : MultiReAct: Multimodal Tools Augmented Reasoning-Acting Traces for Embodied Agent Planning »
Zhouliang Yu · Jie Fu · Yao Mu · Chenguang Wang · Lin Shao · Yaodong Yang -
2023 : Towards End-to-end 4-Bit Inference on Generative Large Language Models »
Saleh Ashkboos · Ilia Markov · Elias Frantar · Tingxuan Zhong · Xincheng Wang · Jie Ren · Torsten Hoefler · Dan Alistarh -
2023 : Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training »
Xidong Feng · Ziyu Wan · Muning Wen · Ying Wen · Weinan Zhang · Jun Wang -
2023 : Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training »
Xidong Feng · Ziyu Wan · Muning Wen · Ying Wen · Weinan Zhang · Jun Wang -
2023 Poster: Hierarchical Multi-Agent Skill Discovery »
Mingyu Yang · Yaodong Yang · Zhenbo Lu · Wengang Zhou · Houqiang Li -
2023 Poster: Lending Interaction Wings to Recommender Systems with Conversational Agents »
Jiarui Jin · Xianyu Chen · Fanghua Ye · Mengyue Yang · Yue Feng · Weinan Zhang · Yong Yu · Jun Wang -
2023 Poster: BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset »
Jiaming Ji · Mickel Liu · Josef Dai · Xuehai Pan · Chi Zhang · Ce Bian · Boyuan Chen · Ruiyang Sun · Yizhou Wang · Yaodong Yang -
2023 Poster: Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark »
Jiaming Ji · Borong Zhang · Jiayi Zhou · Xuehai Pan · Weidong Huang · Ruiyang Sun · Yiran Geng · Yifan Zhong · Josef Dai · Yaodong Yang -
2023 Poster: Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning »
Stephen McAleer · Gabriele Farina · Gaoyue Zhou · Mingzhi Wang · Yaodong Yang · Tuomas Sandholm -
2023 Poster: Multi-Agent First Order Constrained Optimization in Policy Space »
Youpeng Zhao · Yaodong Yang · Zhenbo Lu · Wengang Zhou · Houqiang Li -
2023 Poster: Invariant Learning via Probability of Sufficient and Necessary Causes »
Mengyue Yang · Yonggang Zhang · Zhen Fang · Yali Du · Furui Liu · Jean-Francois Ton · Jianhong Wang · Jun Wang -
2023 Poster: An Efficient End-to-End Training Approach for Zero-Shot Human-AI Coordination »
Xue Yan · Jiaxian Guo · Xingzhou Lou · Jun Wang · Haifeng Zhang · Yali Du -
2023 Poster: Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach »
Yudi Zhang · Yali Du · Biwei Huang · Ziyan Wang · Jun Wang · Meng Fang · Mykola Pechenizkiy -
2023 Poster: Online PCA in Converging Self-consistent Field Equations »
Xihan Li · Xiang Chen · Rasul Tutunov · Haitham Bou Ammar · Lei Wang · Jun Wang -
2023 Poster: Policy Space Diversity for Non-Transitive Games »
Jian Yao · Weiming Liu · Haobo Fu · Yaodong Yang · Stephen McAleer · Qiang Fu · Wei Yang -
2023 Poster: ChessGPT: Bridging Policy Learning and Language Modeling »
Xidong Feng · Yicheng Luo · Ziyan Wang · Hongrui Tang · Mengyue Yang · Kun Shao · David Mguni · Yali Du · Jun Wang -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Optimistic Tree Searches for Combinatorial Black-Box Optimization »
Cedric Malherbe · Antoine Grosnit · Rasul Tutunov · Haitham Bou Ammar · Jun Wang -
2022 Spotlight: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 Poster: Optimistic Tree Searches for Combinatorial Black-Box Optimization »
Cedric Malherbe · Antoine Grosnit · Rasul Tutunov · Haitham Bou Ammar · Jun Wang -
2022 Poster: MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control »
Xuehai Pan · Mickel Liu · Fangwei Zhong · Yaodong Yang · Song-Chun Zhu · Yizhou Wang -
2022 Poster: Enhancing Safe Exploration Using Safety State Augmentation »
Aivar Sootla · Alexander Cowen-Rivers · Jun Wang · Haitham Bou Ammar -
2022 Poster: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem »
Muning Wen · Jakub Kuba · Runji Lin · Weinan Zhang · Ying Wen · Jun Wang · Yaodong Yang -
2022 Poster: EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine »
Jiayi Weng · Min Lin · Shengyi Huang · Bo Liu · Denys Makoviichuk · Viktor Makoviychuk · Zichen Liu · Yufan Song · Ting Luo · Yukun Jiang · Zhongwen Xu · Shuicheng Yan -
2022 Poster: Pluralistic Image Completion with Gaussian Mixture Models »
Xiaobo Xia · Wenhao Yang · Jie Ren · Yewen Li · Yibing Zhan · Bo Han · Tongliang Liu -
2021 Poster: Settling the Variance of Multi-Agent Policy Gradients »
Jakub Grudzien Kuba · Muning Wen · Linghui Meng · shangding gu · Haifeng Zhang · David Mguni · Jun Wang · Yaodong Yang -
2021 Poster: Neural Auto-Curricula in Two-Player Zero-Sum Games »
Xidong Feng · Oliver Slumbers · Ziyu Wan · Bo Liu · Stephen McAleer · Ying Wen · Jun Wang · Yaodong Yang -
2018 Poster: Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning »
Rui Luo · Jianhong Wang · Yaodong Yang · Jun WANG · Zhanxing Zhu -
2017 : Aligned AI Poster Session »
Amanda Askell · Rafal Muszynski · William Wang · Yaodong Yang · Quoc Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet · Candice Schumann · Anqi Liu · Peter Eckersley · Angelina Wang · William Saunders -
2017 : Poster Session »
Shunsuke Horii · Heejin Jeong · Tobias Schwedes · Qing He · Ben Calderhead · Ertunc Erdil · Jaan Altosaar · Patrick Muchmore · Rajiv Khanna · Ian Gemp · Pengfei Zhang · Yuan Zhou · Chris Cremer · Maria DeYoreo · Alexander Terenin · Brendan McVeigh · Rachit Singh · Yaodong Yang · Erik Bodin · Trefor Evans · Henry Chai · Shandian Zhe · Jeffrey Ling · Vincent ADAM · Lars Maaløe · Andrew Miller · Ari Pakman · Josip Djolonga · Hong Ge