Timezone: »
Curriculum Reinforcement Learning (CRL) aims to create a sequence of tasks, starting from easy ones and gradually learning towards difficult tasks. In this work, we focus on the idea of framing CRL as interpolations between a source (auxiliary) and a target task distribution. Although existing studies have shown the great potential of this idea, it remains unclear how to formally quantify and generate the movement between task distributions. Inspired by the insights from gradual domain adaptation in semi-supervised learning, we create a natural curriculum by breaking down the potentially large task distributional shift in CRL into smaller shifts. We propose GRADIENT which formulates CRL as an optimal transport problem with a tailored distance metric between tasks. Specifically, we generate a sequence of task distributions as a geodesic interpolation between the source and target distributions, which are actually the Wasserstein barycenter. Different from many existing methods, our algorithm considers a task-dependent contextual distance metric and is capable of handling nonparametric distributions in both continuous and discrete context settings. In addition, we theoretically show that GRADIENT enables smooth transfer between subsequent stages in the curriculum under certain conditions. We conduct extensive experiments in locomotion and manipulation tasks and show that our proposed GRADIENT achieves higher performance than baselines in terms of learning efficiency and asymptotic performance.
Author Information
Peide Huang (Carnegie Mellon University)
Mengdi Xu (Carnegie Mellon University)
Jiacheng Zhu (Carnegie Mellon University)
Laixi Shi (Carnegie Mellon University)
I'm actively looking for a research internship in Summer 2022. My research interests include signal processing, nonconvex optimization, high-dimensional statistical estimation, and reinforcement learning, ranging from theory to application.
Fei Fang (Carnegie Mellon University)
DING ZHAO (Carnegie Mellon University)
More from the Same Authors
-
2021 Spotlight: Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning »
Gen Li · Laixi Shi · Yuxin Chen · Yuantao Gu · Yuejie Chi -
2021 : Latent Goal Allocation for Multi-Agent Goal-Conditioned Self-Supervised Learning »
Laixi Shi · Peide Huang · Rui Chen -
2022 : Hyper-Decision Transformer for Efficient Online Policy Adaptation »
Mengdi Xu · Yuchen Lu · Yikang Shen · Shun Zhang · DING ZHAO · Chuang Gan -
2022 : Benchmarking Robustness under Distribution Shift of Multimodal Image-Text Models »
Jielin Qiu · Yi Zhu · Xingjian Shi · Zhiqiang Tang · DING ZHAO · Bo Li · Mu Li -
2022 : On the Robustness of Safe Reinforcement Learning under Observational Perturbations »
ZUXIN LIU · Zijian Guo · Zhepeng Cen · Huan Zhang · Jie Tan · Bo Li · DING ZHAO -
2023 Poster: Adaptive Online Replanning with Diffusion Models »
Siyuan Zhou · Yilun Du · Shun Zhang · Mengdi Xu · Yikang Shen · Wei Xiao · Dit-Yan Yeung · Chuang Gan -
2023 Poster: The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model »
Laixi Shi · Gen Li · Yuting Wei · Yuxin Chen · Matthieu Geist · Yuejie Chi -
2023 Poster: Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation »
Wenhao Ding · Laixi Shi · Yuejie Chi · DING ZHAO -
2023 Poster: Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning »
Yihang Yao · ZUXIN LIU · Zhepeng Cen · Jiacheng Zhu · Wenhao Yu · Tingnan Zhang · DING ZHAO -
2023 Poster: A One-Size-Fits-All Approach to Improving Randomness in Paper Assignment »
Yixuan Xu · Steven Jecmen · Zimeng Song · Fei Fang -
2023 Poster: Learning Shared Safety Constraints from Multi-task Demonstrations »
Konwoo Kim · Gokul Swamy · ZUXIN LIU · DING ZHAO · Sanjiban Choudhury · Steven Wu -
2023 Workshop: Computational Sustainability: Promises and Pitfalls from Theory to Deployment »
Suzanne Stathatos · Christopher Yeh · Laura Greenstreet · Tarun Sharma · Katelyn Morrison · Yuanqi Du · Chenlin Meng · Sherrie Wang · Fei Fang · Pietro Perona · Yoshua Bengio -
2022 Poster: Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning »
Wenhao Ding · Haohong Lin · Bo Li · DING ZHAO -
2022 Poster: PerfectDou: Dominating DouDizhu with Perfect Information Distillation »
Guan Yang · Minghuan Liu · Weijun Hong · Weinan Zhang · Fei Fang · Guangjun Zeng · Yue Lin -
2022 Poster: Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality »
Jibang Wu · Weiran Shen · Fei Fang · Haifeng Xu -
2022 Poster: SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles »
Chejian Xu · Wenhao Ding · Weijie Lyu · ZUXIN LIU · Shuai Wang · Yihan He · Hanjiang Hu · DING ZHAO · Bo Li -
2021 Poster: Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning »
Gen Li · Laixi Shi · Yuxin Chen · Yuantao Gu · Yuejie Chi -
2020 Poster: Deep Archimedean Copulas »
Chun Kai Ling · Fei Fang · J. Zico Kolter -
2020 Poster: Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes »
Mengdi Xu · Wenhao Ding · Jiacheng Zhu · ZUXIN LIU · Baiming Chen · Ding Zhao -
2020 Poster: Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments »
Steven Jecmen · Hanrui Zhang · Ryan Liu · Nihar Shah · Vincent Conitzer · Fei Fang -
2019 : Poster Session »
Rishav Chourasia · Yichong Xu · Corinna Cortes · Chien-Yi Chang · Yoshihiro Nagano · So Yeon Min · Benedikt Boecking · Phi Vu Tran · Kamyar Ghasemipour · Qianggang Ding · Shouvik Mani · Vikram Voleti · Rasool Fakoor · Miao Xu · Kenneth Marino · Lisa Lee · Volker Tresp · Jean-Francois Kagy · Marvin Zhang · Barnabas Poczos · Dinesh Khandelwal · Adrien Bardes · Evan Shelhamer · Jiacheng Zhu · Ziming Li · Xiaoyan Li · Dmitrii Krasheninnikov · Ruohan Wang · Mayoore Jaiswal · Emad Barsoum · Suvansh Sanjeev · Theeraphol Wattanavekin · Qizhe Xie · Sifan Wu · Yuki Yoshida · David Kanaa · Sina Khoshfetrat Pakazad · Mehdi Maasoumy -
2019 Poster: Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks »
Gabriele Farina · Chun Kai Ling · Fei Fang · Tuomas Sandholm -
2019 Poster: Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium »
Gabriele Farina · Chun Kai Ling · Fei Fang · Tuomas Sandholm -
2019 Spotlight: Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium »
Gabriele Farina · Chun Kai Ling · Fei Fang · Tuomas Sandholm