Timezone: »
Poster
Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping
Hao Sun · Lei Han · Rui Yang · Xiaoteng Ma · Jian Guo · Bolei Zhou
In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a linear transformation is equivalent to changing the initialization of the $Q$-function in function approximation. Based on such an equivalence, we bring the key insight that a positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration. Accordingly, conservative exploitation improves offline RL value estimation, and optimistic value estimation improves exploration for online RL. We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efficiency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method.
Author Information
Hao Sun (University of Cambridge)
Lei Han (Tencent AI Lab)
Rui Yang (Hong Kong University of Science and Technology)
I’m a first year Ph.D. student at CSE, the Hong Kong University of Science and Technology, supervised by Prof. Tong Zhang. I received my master’s degree and bachelor’s degree from the Department of Automation at Tsinghua University. My research interests lie in deep reinforcement learning (RL), especially goal-conditioned RL, offline RL and model-based RL. I’m also interested in the application of RL algorithms to game AI and robotics.
Xiaoteng Ma (Department of Automation, Tsinghua University)
Jian Guo
Bolei Zhou (UCLA)

Assistant professor at UCLA's computer science department
More from the Same Authors
-
2021 Spotlight: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2021 : GPU-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning »
Xiao-Yang Liu · Zhuoran Yang · Zhaoran Wang · Anwar Walid · Jian Guo · Michael Jordan -
2021 : MHER: Model-based Hindsight Experience Replay »
Yang Rui · Meng Fang · Lei Han · Yali Du · Feng Luo · Xiu Li -
2022 Poster: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 Poster: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 : ChemSpacE: Interpretable and Interactive Chemical Space Exploration »
Yuanqi Du · Xian Liu · Nilay Shah · Shengchao Liu · Jieyu Zhang · Bolei Zhou -
2022 : Constrained MDPs can be Solved by Eearly-Termination with Recurrent Models »
Hao Sun · Ziping Xu · Meng Fang · Zhenghao Peng · Taiyi Wang · Bolei Zhou -
2022 : Supervised Q-Learning can be a Strong Baseline for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : GraphCG: Unsupervised Discovery of Steerable Factors in Graphs »
Shengchao Liu · Chengpeng Wang · Weili Nie · Hanchen Wang · Jiarui Lu · Bolei Zhou · Jian Tang -
2022 : Supervised Q-Learning for Continuous Control »
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou -
2022 : MOPA: a Minimalist Off-Policy Approach to Safe-RL »
Hao Sun · Ziping Xu · Zhenghao Peng · Meng Fang · Bo Dai · Bolei Zhou -
2022 : Novel Policy Seeking with Constrained Optimization »
Hao Sun · Zhenghao Peng · Bolei Zhou -
2022 : Toward Causal-Aware RL: State-Wise Action-Refined Temporal Difference »
Hao Sun · Taiyi Wang -
2023 Poster: Learning from Active Human Involvement through Proxy Value Propagation »
Zhenghao Peng · Wenjie Mo · Chenda Duan · Quanyi Li · Bolei Zhou -
2023 Poster: Corruption-Robust Offline Reinforcement Learning with General Function Approximation »
Chenlu Ye · Rui Yang · Quanquan Gu · Tong Zhang -
2023 Poster: MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy »
Honghua Dong · Jiawei Xu · Yu Yang · Rui Zhao · Shiwen Wu · Chun Yuan · Xiu Li · Chris Maddison · Lei Han -
2023 Poster: AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation »
Tong Wu · Zhihao Fan · Xiao Liu · Yeyun Gong · yelong shen · Jian Jiao · Hai-Tao Zheng · Juntao Li · zhongyu wei · Jian Guo · Nan Duan · Weizhu Chen -
2023 Poster: ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling »
Quanyi Li · Zhenghao Peng · Lan Feng · Zhizheng Liu · Chenda Duan · Wenjie Mo · Bolei Zhou -
2022 Spotlight: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 Spotlight: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · Shubham Sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 : Toward Generalizable Embodied AI for Autonomous Driving »
Bolei Zhou -
2022 Poster: Human-AI Shared Control via Policy Dissection »
Quanyi Li · Zhenghao Peng · Haibin Wu · Lan Feng · Bolei Zhou -
2022 Poster: Improving GANs with A Dynamic Discriminator »
Ceyuan Yang · Yujun Shen · Yinghao Xu · Deli Zhao · Bo Dai · Bolei Zhou -
2022 Poster: FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning »
Xiao-Yang Liu · Ziyi Xia · Jingyang Rui · Jiechao Gao · Hongyang Yang · Ming Zhu · Christina Wang · Zhaoran Wang · Jian Guo -
2021 Poster: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2021 Poster: Dynamic Bottleneck for Robust Self-Supervised Exploration »
Chenjia Bai · Lingxiao Wang · Lei Han · Animesh Garg · Jianye Hao · Peng Liu · Zhaoran Wang -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Curriculum-guided Hindsight Experience Replay »
Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang -
2019 Poster: Policy Continuation with Hindsight Inverse Dynamics »
Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin -
2019 Spotlight: Policy Continuation with Hindsight Inverse Dynamics »
Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin -
2019 Poster: LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning »
Yali Du · Lei Han · Meng Fang · Ji Liu · Tianhong Dai · Dacheng Tao