Timezone: »
Solving multi-goal reinforcement learning (RL) problems with sparse rewards is generally challenging. Existing approaches have utilized goal relabeling on collected experiences to alleviate issues raised from sparse rewards. However, these methods are still limited in efficiency and cannot make full use of experiences. In this paper, we propose Model-based Hindsight Experience Replay (MHER), which exploits experiences more efficiently by leveraging environmental dynamics to generate virtual achieved goals. Replacing original goals with virtual goals generated from interaction with a trained dynamics model leads to a novel relabeling method, model-based relabeling (MBR). Based on MBR, MHER performs both reinforcement learning and supervised learning for efficient policy improvement. Theoretically, we also prove the supervised part in MHER, i.e., goal-conditioned supervised learning with MBR data, optimizes a lower bound on the multi-goal RL objective. Experimental results in several point-based tasks and simulated robotics environments show that MHER achieves significantly higher sample efficiency than previous model-free and model-based multi-goal methods.
Author Information
Yang Rui (Tsinghua University)
Meng Fang (Tencent)
Lei Han (Tencent AI Lab)
Yali Du (University College London)
I am currently a research fellow at UCL. I am interested in multi-agent reinforcement learning, adversarial machine learning and recommendation systems.
Feng Luo (Tsinghua University, Tsinghua University)
Xiu Li
More from the Same Authors
-
2022 Poster: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 Poster: Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination »
Jiafei Lyu · Xiu Li · Zongqing Lu -
2022 Poster: OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression »
Wanhua Li · Xiaoke Huang · Zheng Zhu · Yansong Tang · Xiu Li · Jie Zhou · Jiwen Lu -
2022 Poster: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 : Constrained MDPs can be Solved by Eearly-Termination with Recurrent Models »
Hao Sun · Ziping Xu · Meng Fang · Zhenghao Peng · Taiyi Wang · Bolei Zhou -
2022 : Supervised Q-Learning can be a Strong Baseline for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : State Advantage Weighting for Offline RL »
Jiafei Lyu · aicheng Gong · Le Wan · Zongqing Lu · Xiu Li -
2022 : Emergent collective intelligence from massive-agent cooperation and competition »
Hanmo Chen · Stone Tao · JIAXIN CHEN · Weihan Shen · Xihui Li · Chenghui Yu · Sikai Cheng · Xiaolong Zhu · Xiu Li -
2022 : Supervised Q-Learning for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : MOPA: a Minimalist Off-Policy Approach to Safe-RL »
Hao Sun · Ziping Xu · Zhenghao Peng · Meng Fang · Bo Dai · Bolei Zhou -
2022 Spotlight: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 Spotlight: Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination »
Jiafei Lyu · Xiu Li · Zongqing Lu -
2022 Spotlight: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · shubham sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 Poster: Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping »
Hao Sun · Lei Han · Rui Yang · Xiaoteng Ma · Jian Guo · Bolei Zhou -
2021 Poster: Dynamic Bottleneck for Robust Self-Supervised Exploration »
Chenjia Bai · Lingxiao Wang · Lei Han · Animesh Garg · Jianye Hao · Peng Liu · Zhaoran Wang -
2020 Poster: Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games »
Yunqiu Xu · Meng Fang · Ling Chen · Yali Du · Joey Tianyi Zhou · Chengqi Zhang -
2019 Poster: Curriculum-guided Hindsight Experience Replay »
Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang -
2019 Poster: LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning »
Yali Du · Lei Han · Meng Fang · Ji Liu · Tianhong Dai · Dacheng Tao