`

Timezone: »

 
Oral
Non-delusional Q-learning and value-iteration
Tyler Lu · Dale Schuurmans · Craig Boutilier

Thu Dec 06 01:25 PM -- 01:40 PM (PST) @ Room 220 CD

We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation. Delusional bias arises when the approximation architecture limits the class of expressible greedy policies. Since standard Q-updates make globally uncoordinated action choices with respect to the expressible policy class, inconsistent or even conflicting Q-value estimates can result, leading to pathological behaviour such as over/under-estimation, instability and even divergence. To solve this problem, we introduce a new notion of policy consistency and define a local backup process that ensures global consistency through the use of information sets---sets that record constraints on policies consistent with backed-up Q-values. We prove that both the model-based and model-free algorithms using this backup remove delusional bias, yielding the first known algorithms that guarantee optimal results under general conditions. These algorithms furthermore only require polynomially many information sets (from a potentially exponential support). Finally, we suggest other practical heuristics for value-iteration and Q-learning that attempt to reduce delusional bias.

Author Information

Tyler Lu (Google)
Dale Schuurmans (Google Inc.)
Craig Boutilier (Google)

More from the Same Authors

  • 2020 Poster: Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration »
    Hanjun Dai · Rishabh Singh · Bo Dai · Charles Sutton · Dale Schuurmans
  • 2020 Poster: Differentiable Meta-Learning of Bandit Policies »
    Craig Boutilier · Chih-wei Hsu · Branislav Kveton · Martin Mladenov · Csaba Szepesvari · Manzil Zaheer
  • 2020 Poster: Latent Bandits Revisited »
    Joey Hong · Branislav Kveton · Manzil Zaheer · Yinlam Chow · Amr Ahmed · Craig Boutilier
  • 2020 Poster: A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs »
    Nevena Lazic · Dong Yin · Mehrdad Farajtabar · Nir Levine · DILAN Gorur · Chris Harris · Dale Schuurmans
  • 2020 Poster: Escaping the Gravitational Pull of Softmax »
    Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans
  • 2020 Oral: Escaping the Gravitational Pull of Softmax »
    Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans
  • 2020 Poster: CoinDICE: Off-Policy Confidence Interval Estimation »
    Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans
  • 2020 Poster: Off-Policy Evaluation via the Regularized Lagrangian »
    Sherry Yang · Ofir Nachum · Bo Dai · Lihong Li · Dale Schuurmans
  • 2020 Spotlight: CoinDICE: Off-Policy Confidence Interval Estimation »
    Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans
  • 2019 : Closing Remarks »
    Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White
  • 2019 : Poster Spotlight 2 »
    Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare
  • 2019 : Poster and Coffee Break 1 »
    Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · zz Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Vincent Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova
  • 2019 Workshop: The Optimization Foundations of Reinforcement Learning »
    Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White
  • 2019 : Opening Remarks »
    Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White
  • 2019 Poster: Exponential Family Estimation via Adversarial Dynamics Embedding »
    Bo Dai · Zhen Liu · Hanjun Dai · Niao He · Arthur Gretton · Le Song · Dale Schuurmans
  • 2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
    Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle
  • 2018 Poster: Non-delusional Q-learning and value-iteration »
    Tyler Lu · Dale Schuurmans · Craig Boutilier
  • 2018 Poster: Data center cooling using model-predictive control »
    Nevena Lazic · Craig Boutilier · Tyler Lu · Eehern Wong · Binz Roy · Moonkyung Ryu · Greg Imwalle