Timezone: »
Value function learning plays a central role in many state-of-the-art reinforcement learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variants of Bellman operator that are not necessarily a contraction. As a result, they may easily lose convergence guarantees, as can be observed in practice. In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods with guaranteed convergence. The key advantage is that its gradient can be easily approximated using sampled transitions, avoiding the need for double samples required by prior algorithms like residual gradient. Our approach may be combined with general function classes such as neural networks, using either on- or off-policy data, and is shown to work reliably and effectively in several benchmarks, including classic problems where standard algorithms are known to diverge.
Author Information
Yihao Feng (UT Austin)
I am a Ph.D student at UT Austin, where I work on Reinforcement Learning and Approximate Inference. I am looking for internships for summer 2020! Please feel free to contact me (yihao AT cs.utexas.edu) if you have open positions!
Lihong Li (Google Research)
Qiang Liu (UT Austin)
More from the Same Authors
-
2022 Workshop: Reinforcement Learning for Real Life (RL4RealLife) Workshop »
Yuxi Li · Emma Brunskill · MINMIN CHEN · Omer Gottesman · Lihong Li · Yao Liu · Zhiwei Tony Qin · Matthew Taylor -
2020 : Panel »
Emma Brunskill · Nan Jiang · Nando de Freitas · Finale Doshi-Velez · Sergey Levine · John Langford · Lihong Li · George Tucker · Rishabh Agarwal · Aviral Kumar -
2020 Poster: Stein Self-Repulsive Dynamics: Benefits From Past Samples »
Mao Ye · Tongzheng Ren · Qiang Liu -
2020 Poster: Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework »
Dinghuai Zhang · Mao Ye · Chengyue Gong · Zhanxing Zhu · Qiang Liu -
2020 Poster: Certified Monotonic Neural Networks »
Xingchao Liu · Xing Han · Na Zhang · Qiang Liu -
2020 Spotlight: Certified Monotonic Neural Networks »
Xingchao Liu · Xing Han · Na Zhang · Qiang Liu -
2020 Poster: Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks »
Lemeng Wu · Bo Liu · Peter Stone · Qiang Liu -
2020 Poster: Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough »
Mao Ye · Lemeng Wu · Qiang Liu -
2020 Poster: Escaping the Gravitational Pull of Softmax »
Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Oral: Escaping the Gravitational Pull of Softmax »
Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Off-Policy Evaluation via the Regularized Lagrangian »
Mengjiao (Sherry) Yang · Ofir Nachum · Bo Dai · Lihong Li · Dale Schuurmans -
2020 Poster: Off-Policy Interval Estimation with Lipschitz Value Iteration »
Ziyang Tang · Yihao Feng · Na Zhang · Jian Peng · Qiang Liu -
2020 Spotlight: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2019 : Closing Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Workshop: The Optimization Foundations of Reinforcement Learning »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 : Opening Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 Poster: Splitting Steepest Descent for Growing Neural Architectures »
Lemeng Wu · Dilin Wang · Qiang Liu -
2019 Spotlight: Splitting Steepest Descent for Growing Neural Architectures »
Lemeng Wu · Dilin Wang · Qiang Liu -
2019 Poster: Stein Variational Gradient Descent With Matrix-Valued Kernels »
Dilin Wang · Ziyang Tang · Chandrajit Bajaj · Qiang Liu -
2019 Poster: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2019 Spotlight: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2019 Poster: Exploration via Hindsight Goal Generation »
Zhizhou Ren · Kefan Dong · Yuan Zhou · Qiang Liu · Jian Peng -
2018 : Hierarchical reinforcement learning for composite-task dialogues »
Lihong Li -
2018 Poster: Variational Inference with Tail-adaptive f-Divergence »
Dilin Wang · Hao Liu · Qiang Liu -
2018 Oral: Variational Inference with Tail-adaptive f-Divergence »
Dilin Wang · Hao Liu · Qiang Liu -
2018 Poster: Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation »
Qiang Liu · Lihong Li · Ziyang Tang · Denny Zhou -
2018 Spotlight: Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation »
Qiang Liu · Lihong Li · Ziyang Tang · Denny Zhou -
2018 Poster: Stein Variational Gradient Descent as Moment Matching »
Qiang Liu · Dilin Wang -
2018 Poster: Adversarial Attacks on Stochastic Bandits »
Kwang-Sung Jun · Lihong Li · Yuzhe Ma · Jerry Zhu -
2017 Workshop: From 'What If?' To 'What Next?' : Causal Inference and Machine Learning for Intelligent Decision Making »
Ricardo Silva · Panagiotis Toulis · John Shawe-Taylor · Alexander Volfovsky · Thorsten Joachims · Lihong Li · Nathan Kallus · Adith Swaminathan -
2017 Poster: Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes »
Jianshu Chen · Chong Wang · Lin Xiao · Ji He · Lihong Li · Li Deng -
2016 Poster: Active Learning with Oracle Epiphany »
Tzu-Kuo Huang · Lihong Li · Ara Vartanian · Saleema Amershi · Jerry Zhu -
2011 Poster: An Empirical Evaluation of Thompson Sampling »
Olivier Chapelle · Lihong Li -
2010 Spotlight: Learning from Logged Implicit Exploration Data »
Alex Strehl · Lihong Li · John Langford · Sham M Kakade -
2010 Poster: Learning from Logged Implicit Exploration Data »
Alexander L Strehl · John Langford · Lihong Li · Sham M Kakade -
2010 Poster: Parallelized Stochastic Gradient Descent »
Martin A Zinkevich · Markus Weimer · Alexander Smola · Lihong Li -
2008 Poster: Sparse Online Learning via Truncated Gradient »
John Langford · Lihong Li · Tong Zhang -
2008 Spotlight: Sparse Online Learning via Truncated Gradient »
John Langford · Lihong Li · Tong Zhang