Skip to yearly menu bar Skip to main content


A Long $N$-step Surrogate Stage Reward for Deep Reinforcement Learning

Junmin Zhong · Ruofan Wu · Jennie Si

Great Hall & Hall B1+B2 (level 1) #1410
[ ]
Wed 13 Dec 8:45 a.m. PST — 10:45 a.m. PST

Abstract: We introduce a new stage reward estimator named the long $N$-step surrogate stage (LNSS) reward for deep reinforcement learning (RL). It aims at mitigating the high variance problem, which has shown impeding successful convergence of learning, hurting task performance, and hindering applications of deep RL in continuous control problems. In this paper we show that LNSS, which utilizes a long reward trajectory of rewards of future steps, provides consistent performance improvement measured by average reward, convergence speed, learning success rate,and variance reduction in $Q$ values and rewards. Our evaluations are based on a variety of environments in DeepMind Control Suite and OpenAI Gym by using LNSS in baseline deep RL algorithms such as DDPG, D4PG, and TD3. We show that LNSS reward has enabled good results that have been challenging to obtain by deep RL previously. Our analysis also shows that LNSS exponentially reduces the upper bound on the variances of $Q$ values from respective single-step methods.

Chat is not available.