Timezone: »
In off-policy deep reinforcement learning, it is usually hard to collect sufficient successful experiences with sparse rewards to learn from. Hindsight experience replay (HER) enables an agent to learn from failures by treating the achieved state of a failed experience as a pseudo goal. However, not all the failed experiences are equally useful to different learning stages, so it is not efficient to replay all of them or uniform samples of them. In this paper, we propose to 1) adaptively select the failed experiences for replay according to the proximity to the true goals and the curiosity of exploration over diverse pseudo goals, and 2) gradually change the proportion of the goal-proximity and the diversity-based curiosity in the selection criteria: we adopt a human-like learning strategy that enforces more curiosity in earlier stages and changes to larger goal-proximity later. This ''Goal-and-Curiosity-driven Curriculum Learning'' leads to ''Curriculum-guided HER (CHER)'', which adaptively and dynamically controls the exploration-exploitation trade-off during the learning process via hindsight experience selection. We show that CHER improves the state of the art in challenging robotics environments.
Author Information
Meng Fang (Tencent)
Tianyi Zhou (University of Washington, Seattle)
Tianyi Zhou is a Ph.D. student in Computer Science at University of Washington and a member of MELODI lab led by Prof. Jeff A. Bilmes. He will be joining University of Maryland, College Park as a tenure-track assistant professor at the Department of Computer Science and affiliated with UMIACS in 2022. His research interests are in machine learning, optimization, and natural language processing. He has published ~60 papers at NeurIPS, ICML, ICLR, AISTATS, EMNLP, NAACL, COLING, KDD, ICDM, AAAI, IJCAI, ISIT, Machine Learning (Springer), IEEE TIP/TNNLS/TKDE, etc. He is the recipient of the Best Student Paper Award at ICDM 2013 and the 2020 IEEE TCSC Most Influential Paper Award.
Yali Du (University College London)
I am currently a research fellow at UCL. I am interested in multi-agent reinforcement learning, adversarial machine learning and recommendation systems.
Lei Han (Tencent AI Lab)
Zhengyou Zhang (Tencent)
More from the Same Authors
-
2021 Spotlight: Constrained Robust Submodular Partitioning »
Shengjie Wang · Tianyi Zhou · Chandrashekhar Lavania · Jeff A Bilmes -
2021 : MHER: Model-based Hindsight Experience Replay »
Yang Rui · Meng Fang · Lei Han · Yali Du · Feng Luo · Xiu Li -
2021 Poster: Constrained Robust Submodular Partitioning »
Shengjie Wang · Tianyi Zhou · Chandrashekhar Lavania · Jeff A Bilmes -
2021 Poster: Class-Disentanglement and Applications in Adversarial Detection and Defense »
Kaiwen Yang · Tianyi Zhou · Yonggang Zhang · Xinmei Tian · Dacheng Tao -
2021 Poster: CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum »
Shuang Ao · Tianyi Zhou · Guodong Long · Qinghua Lu · Liming Zhu · Jing Jiang -
2021 Poster: Dynamic Bottleneck for Robust Self-Supervised Exploration »
Chenjia Bai · Lingxiao Wang · Lei Han · Animesh Garg · Jianye Hao · Peng Liu · Zhaoran Wang -
2020 Poster: Curriculum Learning by Dynamic Instance Hardness »
Tianyi Zhou · Shengjie Wang · Jeffrey A Bilmes -
2020 Poster: Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games »
Yunqiu Xu · Meng Fang · Ling Chen · Yali Du · Joey Tianyi Zhou · Chengqi Zhang -
2019 Poster: Learning to Propagate for Graph Meta-Learning »
LU LIU · Tianyi Zhou · Guodong Long · Jing Jiang · Chengqi Zhang -
2019 Poster: LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning »
Yali Du · Lei Han · Meng Fang · Ji Liu · Tianhong Dai · Dacheng Tao -
2018 Poster: Diverse Ensemble Evolution: Curriculum Data-Model Marriage »
Tianyi Zhou · Shengjie Wang · Jeffrey A Bilmes -
2014 Poster: Divide-and-Conquer Learning by Anchoring a Conical Hull »
Tianyi Zhou · Jeffrey A Bilmes · Carlos Guestrin