Timezone: »
Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe reinforcement learning meth- ods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. 2) CUP unifies performance bounds, providing a better understanding and in- terpretability for some existing algorithms; 3) CUP provides a non-convex im- plementation via only first-order optimizers, which does not require any strong approximation on the convexity of the objectives. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction. We have opened the source code of CUP at https://github.com/zmsn-2077/CUP-safe-rl.
Author Information
Long Yang (Zhejiang University)
Jiaming Ji
Juntao Dai (Zhejiang University)
Linrui Zhang (Tsinghua University)
Binbin Zhou (Zhejiang University City College)
Pengfei Li (Zhejiang University)
Yaodong Yang (AIG)
Gang Pan (Zhejiang University)
More from the Same Authors
-
2022 Poster: Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning »
Runze Liu · Fengshuo Bai · Yali Du · Yaodong Yang -
2022 Poster: Tracking Functional Changes in Nonstationary Signals with Evolutionary Ensemble Bayesian Model for Robust Neural Decoding »
Xinyun Zhu · Yu Qi · Gang Pan · Yueming Wang -
2022 Poster: Online Neural Sequence Detection with Hierarchical Dirichlet Point Process »
Weihan Li · Yu Qi · Gang Pan -
2022 Poster: A Unified Diversity Measure for Multiagent Reinforcement Learning »
Zongkai Liu · Chao Yu · Yaodong Yang · peng sun · Zifan Wu · Yuan Li -
2022 Poster: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 : TorchOpt: An Efficient library for Differentiable Optimization »
Jie Ren · Xidong Feng · Bo Liu · Xuehai Pan · Yao Fu · Luo Mai · Yaodong Yang -
2022 : Contextual Transformer for Offline Meta Reinforcement Learning »
Runji Lin · Ye Li · Xidong Feng · Zhaowei Zhang · XIAN HONG WU FUNG · Haifeng Zhang · Jun Wang · Yali Du · Yaodong Yang -
2022 Spotlight: Lightning Talks 3A-2 »
shuwen yang · Xu Zhang · Delvin Ce Zhang · Lan-Zhe Guo · Renzhe Xu · Zhuoer Xu · Yao-Xiang Ding · Weihan Li · Xingxuan Zhang · Xi-Zhu Wu · Zhenyuan Yuan · Hady Lauw · Yu Qi · Yi-Ge Zhang · Zhihao Yang · Guanghui Zhu · Dong Li · Changhua Meng · Kun Zhou · Gang Pan · Zhi-Fan Wu · Bo Li · Minghui Zhu · Zhi-Hua Zhou · Yafeng Zhang · Yingxueff Zhang · shiwen cui · Jie-Jing Shao · Zhanguang Zhang · Zhenzhe Ying · Xiaolong Chen · Yu-Feng Li · Guojie Song · Peng Cui · Weiqiang Wang · Ming GU · Jianye Hao · Yihua Huang -
2022 Spotlight: Online Neural Sequence Detection with Hierarchical Dirichlet Point Process »
Weihan Li · Yu Qi · Gang Pan -
2022 Spotlight: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 Poster: MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control »
Xuehai Pan · Mickel Liu · Fangwei Zhong · Yaodong Yang · Song-Chun Zhu · Yizhou Wang -
2022 Poster: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem »
Muning Wen · Jakub Kuba · Runji Lin · Weinan Zhang · Ying Wen · Jun Wang · Yaodong Yang -
2022 Poster: A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning »
Bo Liu · Xidong Feng · Jie Ren · Luo Mai · Rui Zhu · Haifeng Zhang · Jun Wang · Yaodong Yang -
2020 Poster: Reconstructing Perceptive Images from Brain Activity by Shape-Semantic GAN »
Tao Fang · Yu Qi · Gang Pan -
2020 Oral: Reconstructing Perceptive Images from Brain Activity by Shape-Semantic GAN »
Tao Fang · Yu Qi · Gang Pan -
2019 Poster: Dynamic Ensemble Modeling Approach to Nonstationary Neural Decoding in Brain-Computer Interfaces »
Yu Qi · Bin Liu · Yueming Wang · Gang Pan -
2018 Poster: Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning »
Rui Luo · Jianhong Wang · Yaodong Yang · Jun WANG · Zhanxing Zhu -
2017 : Aligned AI Poster Session »
Amanda Askell · Rafal Muszynski · William Wang · Yaodong Yang · Quoc Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet · Candice Schumann · Anqi Liu · Peter Eckersley · Angelina Wang · William Saunders -
2017 : Poster Session »
Shunsuke Horii · Heejin Jeong · Tobias Schwedes · Qing He · Ben Calderhead · Ertunc Erdil · Jaan Altosaar · Patrick Muchmore · Rajiv Khanna · Ian Gemp · Pengfei Zhang · Yuan Zhou · Chris Cremer · Maria DeYoreo · Alexander Terenin · Brendan McVeigh · Rachit Singh · Yaodong Yang · Erik Bodin · Trefor Evans · Henry Chai · Shandian Zhe · Jeffrey Ling · Vincent ADAM · Lars Maaløe · Andrew Miller · Ari Pakman · Josip Djolonga · Hong Ge