Timezone: »

 
AsymQ: Asymmetric Q-loss to mitigate overestimation bias in off-policy reinforcement learning
Qinsheng Zhang · Arjun Krishna · Sehoon Ha · Yongxin Chen
Event URL: https://openreview.net/forum?id=KoRdlJjgiQ »

It is well-known that off-policy deep reinforcement learning algorithms suffer from overestimation bias in value function approximation. Existing methods to reduce overestimation bias often utilize multiple value function estimators. Consequently, these methods have a larger time and memory consumption. In this work, we propose a new class of policy evaluation algorithms dubbed, \textbf{AsymQ}, that use asymmetric loss functions to train the Q-value network. Departing from the symmetric loss functions such as mean squared error~(MSE) and Huber loss on the Temporal difference~(TD) error, we adopt asymmetric loss functions of the TD-error to impose a higher penalty on overestimation error. We present one such AsymQ loss called \textbf{Softmax MSE~(SMSE)} that can be implemented with minimal modifications to the standard policy evaluation. Empirically, we show that using SMSE loss helps reduce estimation bias, and subsequently improves policy performance when combined with standard reinforcement learning algorithms. With SMSE, even the Deep Deterministic Policy Gradients~(DDPG) algorithm can achieve performance comparable to that of state-of-the-art methods such as the Twin-Delayed DDPG (TD3) and Soft Actor Critic~(SAC) on challenging environments in the OpenAI Gym MuJoCo benchmark. We additionally demonstrate that the proposed SMSE loss can also boost the performance of Deep Q learning (DQN) in Atari games with discrete action spaces.

Author Information

Qinsheng Zhang (Georgia Institute of Technology)
Arjun Krishna (Georgia Institute of Technology)
Sehoon Ha (Georgia Institute of Technology)
Yongxin Chen (Georgia Institute of Technology)

More from the Same Authors

  • 2021 : On the complexity of the optimal transport problem with graph-structured cost »
    Jiaojiao Fan · Isabel Haasler · Johan Karlsson · Yongxin Chen
  • 2021 : Variational Wasserstein gradient flow »
    Jiaojiao Fan · Amirhossein Taghvaei · Yongxin Chen
  • 2022 : Fast Sampling of Diffusion Models with Exponential Integrator »
    Qinsheng Zhang · Yongxin Chen
  • 2021 Poster: Diffusion Normalizing Flow »
    Qinsheng Zhang · Yongxin Chen
  • 2020 Poster: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
    Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang
  • 2020 Oral: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
    Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang
  • 2019 : Poster and Coffee Break 2 »
    Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall
  • 2019 : Poster Spotlight 2 »
    Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare
  • 2019 Poster: Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost »
    Zhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang