Timezone: »

Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping
Hao Sun · Lei Han · Rui Yang · Xiaoteng Ma · Jian Guo · Bolei Zhou

Wed Nov 30 02:00 PM -- 04:00 PM (PST) @ Hall J #120
In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a linear transformation is equivalent to changing the initialization of the $Q$-function in function approximation. Based on such an equivalence, we bring the key insight that a positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration. Accordingly, conservative exploitation improves offline RL value estimation, and optimistic value estimation improves exploration for online RL. We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efficiency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method.

Author Information

Hao Sun (Cambridge)
Lei Han (Tencent AI Lab)
Rui Yang (Hong Kong University of Science and Technology)
Rui Yang

I’m a first year Ph.D. student at CSE, the Hong Kong University of Science and Technology, supervised by Prof. Tong Zhang. I received my master’s degree and bachelor’s degree from the Department of Automation at Tsinghua University. My research interests lie in deep reinforcement learning (RL), especially goal-conditioned RL, offline RL and model-based RL. I’m also interested in the application of RL algorithms to game AI and robotics.

Xiaoteng Ma (Department of Automation, Tsinghua University)
Jian Guo
Bolei Zhou (University of California, Los Angeles (UCLA))
Bolei Zhou

an assistant professor at UCLA's computer science department

More from the Same Authors