Poster
in
Workshop: Deep Reinforcement Learning

Beyond Target Networks: Improving Deep $Q$-learning with Functional Regularization

Alexandre Piche ⋅ Joseph Marino ⋅ Gian Maria Marconi ⋅ Valentin Thomas ⋅ Chris Pal ⋅ Mohammad Emtiyaz Khan

Project Page [ OpenReview]

Abstract

A majority of recent successes in deep Reinforcement Learning are based on minimization of square Bellman error. The training is often unstable due to a fast-changing target $Q$-values, and target networks are employed to stabilize by using an additional set of lagging parameters. Despite their advantages, target networks could inhibit the propagation of newly-encountered rewards which may ultimately slow down the training. In this work, we address this issue by augmenting the squared Bellman error with a functional regularizer. Unlike target networks', the regularization here is explicit which not only enables us to use up-to-date parameters but also control the regularization. This leads to a fast yet stable training method. Across a range of Atari environments, we demonstrate empirical improvements over target-network based methods in terms of both sample efficiency and performance. In summary, our approach provides a fast and stable alternative to replace the standard squared Bellman error.

Video

Chat is not available.