Timezone: »
Poster
The Role of Baselines in Policy Gradient Optimization
Jincheng Mei · Wesley Chung · Valentin Thomas · Bo Dai · Csaba Szepesvari · Dale Schuurmans
We study the effect of baselines in on-policy stochastic policy gradient optimization, and close the gap between the theory and practice of policy optimization methods. Our first contribution is to show that the \emph{state value} baseline allows on-policy stochastic \emph{natural} policy gradient (NPG) to converge to a globally optimal policy at an $O(1/t)$ rate, which was not previously known. The analysis relies on two novel findings: the expected progress of the NPG update satisfies a stochastic version of the non-uniform \L{}ojasiewicz (N\L{}) inequality, and with probability 1 the state value baseline prevents the optimal action's probability from vanishing, thus ensuring sufficient exploration. Importantly, these results provide a new understanding of the role of baselines in stochastic policy gradient: by showing that the variance of natural policy gradient estimates remains unbounded with or without a baseline, we find that variance reduction \emph{cannot} explain their utility in this setting. Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance. That is, we demonstrate that a finite variance is \emph{not necessary} for almost sure convergence of stochastic NPG, while controlling update aggressiveness is both necessary and sufficient. Additional experimental results verify these theoretical findings.
Author Information
Jincheng Mei (Google Research, Brain Team)
Wesley Chung (McGill University)
Valentin Thomas (Mila)
Bo Dai (Google Brain)
Csaba Szepesvari (University of Alberta)
Dale Schuurmans (Google Brain & University of Alberta)
More from the Same Authors
-
2021 Spotlight: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 : Offline Policy Selection under Uncertainty »
Mengjiao (Sherry) Yang · Bo Dai · Ofir Nachum · George Tucker · Dale Schuurmans -
2021 : Beyond Target Networks: Improving Deep $Q$-learning with Functional Regularization »
Alexandre Piche · Joseph Marino · Gian Maria Marconi · Valentin Thomas · Chris Pal · Mohammad Emtiyaz Khan -
2022 Poster: On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation »
Valentin Thomas -
2023 Poster: Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL »
Qinghua Liu · Gellért Weisz · András György · Chi Jin · Csaba Szepesvari -
2023 Poster: Ordering-based Conditions for Global Convergence of Policy Gradient Methods »
Jincheng Mei · Bo Dai · Alekh Agarwal · Mohammad Ghavamzadeh · Csaba Szepesvari · Dale Schuurmans -
2023 Poster: AdaPlanner: Adaptive Planning from Feedback with Language Models »
Haotian Sun · Yuchen Zhuang · Lingkai Kong · Bo Dai · Chao Zhang -
2023 Poster: Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off »
Zichen Zhang · Johannes Kirschner · Junxi Zhang · Francesco Zanini · Alex Ayoub · Masood Dehghan · Dale Schuurmans -
2023 Poster: Learning Universal Policies via Text-Guided Video Generation »
Yilun Du · Mengjiao (Sherry) Yang · Bo Dai · Hanjun Dai · Ofir Nachum · Josh Tenenbaum · Dale Schuurmans · Pieter Abbeel -
2023 Poster: Regret Minimization via Saddle Point Optimization »
Johannes Kirschner · Alireza Bakhtiari · Kushagra Chandak · Volodymyr Tkachuk · Csaba Szepesvari -
2023 Poster: Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore »
Gellért Weisz · András György · Csaba Szepesvari -
2023 Poster: Context-lumpable stochastic bandits »
Chung-Wei Lee · Qinghua Liu · Yasin Abbasi Yadkori · Chi Jin · Tor Lattimore · Csaba Szepesvari -
2023 Poster: DISCS: A Benchmark for Discrete Sampling »
Katayoon Goshvadi · Haoran Sun · Xingchao Liu · Azade Nova · Ruqi Zhang · Will Grathwohl · Dale Schuurmans · Hanjun Dai -
2023 Oral: Ordering-based Conditions for Global Convergence of Policy Gradient Methods »
Jincheng Mei · Bo Dai · Alekh Agarwal · Mohammad Ghavamzadeh · Csaba Szepesvari · Dale Schuurmans -
2023 Oral: Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore »
Gellért Weisz · András György · Csaba Szepesvari -
2023 Workshop: Foundation Models for Decision Making »
Mengjiao (Sherry) Yang · Ofir Nachum · Yilun Du · Stephen McAleer · Igor Mordatch · Linxi Fan · Jeannette Bohg · Dale Schuurmans -
2022 Poster: A Simple Decentralized Cross-Entropy Method »
Zichen Zhang · Jun Jin · Martin Jagersand · Jun Luo · Dale Schuurmans -
2022 Poster: Oracle Inequalities for Model Selection in Offline Reinforcement Learning »
Jonathan N Lee · George Tucker · Ofir Nachum · Bo Dai · Emma Brunskill -
2022 Poster: Chain of Thought Imitation with Procedure Cloning »
Mengjiao (Sherry) Yang · Dale Schuurmans · Pieter Abbeel · Ofir Nachum -
2022 Poster: Optimal Scaling for Locally Balanced Proposals in Discrete Spaces »
Haoran Sun · Hanjun Dai · Dale Schuurmans -
2022 Poster: Sample-Efficient Reinforcement Learning of Partially Observable Markov Games »
Qinghua Liu · Csaba Szepesvari · Chi Jin -
2022 Poster: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models »
Jason Wei · Xuezhi Wang · Dale Schuurmans · Maarten Bosma · brian ichter · Fei Xia · Ed Chi · Quoc V Le · Denny Zhou -
2022 Poster: Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs »
Gellért Weisz · András György · Tadashi Kozuno · Csaba Szepesvari -
2022 Poster: Near-Optimal Sample Complexity Bounds for Constrained MDPs »
Sharan Vaswani · Lin Yang · Csaba Szepesvari -
2022 Poster: Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization »
Hui Yuan · Chengzhuo Ni · Huazheng Wang · Xuezhou Zhang · Le Cong · Csaba Szepesvari · Mengdi Wang -
2022 Poster: On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games »
Runyu Zhang · Jincheng Mei · Bo Dai · Dale Schuurmans · Na Li -
2021 : Dale Schuurmans Talk Q&A »
Dale Schuurmans -
2021 : Invited Talk: Dale Schuurmans - Understanding Deep Value Estimation »
Dale Schuurmans -
2021 Poster: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 Poster: Towards understanding retrosynthesis by energy-based models »
Ruoxi Sun · Hanjun Dai · Li Li · Steven Kearnes · Bo Dai -
2021 Poster: Understanding the Effect of Stochasticity in Policy Optimization »
Jincheng Mei · Bo Dai · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: Nearly Horizon-Free Offline Reinforcement Learning »
Tongzheng Ren · Jialian Li · Bo Dai · Simon Du · Sujay Sanghavi -
2020 Poster: Off-Policy Imitation Learning from Observations »
Zhuangdi Zhu · Kaixiang Lin · Bo Dai · Jiayu Zhou -
2020 Poster: Differentiable Top-k with Optimal Transport »
Yujia Xie · Hanjun Dai · Minshuo Chen · Bo Dai · Tuo Zhao · Hongyuan Zha · Wei Wei · Tomas Pfister -
2020 Poster: Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration »
Hanjun Dai · Rishabh Singh · Bo Dai · Charles Sutton · Dale Schuurmans -
2020 Poster: Escaping the Gravitational Pull of Softmax »
Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Oral: Escaping the Gravitational Pull of Softmax »
Jincheng Mei · Chenjun Xiao · Bo Dai · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach »
Luofeng Liao · You-Lin Chen · Zhuoran Yang · Bo Dai · Mladen Kolar · Zhaoran Wang -
2020 Poster: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Off-Policy Evaluation via the Regularized Lagrangian »
Mengjiao (Sherry) Yang · Ofir Nachum · Bo Dai · Lihong Li · Dale Schuurmans -
2020 Spotlight: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2019 : Closing Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Workshop: The Optimization Foundations of Reinforcement Learning »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 : Opening Remarks »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 Poster: Maximum Entropy Monte-Carlo Planning »
Chenjun Xiao · Ruitong Huang · Jincheng Mei · Dale Schuurmans · Martin Müller -
2019 Poster: Meta Architecture Search »
Albert Shaw · Wei Wei · Weiyang Liu · Le Song · Bo Dai -
2019 Poster: Surrogate Objectives for Batch Policy Optimization in One-step Decision Making »
Minmin Chen · Ramki Gummadi · Chris Harris · Dale Schuurmans -
2019 Poster: Exponential Family Estimation via Adversarial Dynamics Embedding »
Bo Dai · Zhen Liu · Hanjun Dai · Niao He · Arthur Gretton · Le Song · Dale Schuurmans -
2019 Poster: Energy-Inspired Models: Learning with Sampler-Induced Distributions »
Dieterich Lawson · George Tucker · Bo Dai · Rajesh Ranganath -
2019 Poster: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2019 Spotlight: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2019 Poster: Retrosynthesis Prediction with Conditional Graph Logic Network »
Hanjun Dai · Chengtao Li · Connor Coley · Bo Dai · Le Song -
2019 Poster: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2019 Spotlight: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2018 : Off-policy Policy Optimization (Dale Schuurmans) »
Dale Schuurmans -
2018 Poster: Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification »
Harsh Shrivastava · Eugene Bart · Bob Price · Hanjun Dai · Bo Dai · Srinivas Aluru -
2018 Poster: Coupled Variational Bayes via Optimization Embedding »
Bo Dai · Hanjun Dai · Niao He · Weiyang Liu · Zhen Liu · Jianshu Chen · Lin Xiao · Le Song -
2018 Poster: Predictive Approximate Bayesian Computation via Saddle Points »
Yingxiang Yang · Bo Dai · Negar Kiyavash · Niao He -
2018 Poster: Learning towards Minimum Hyperspherical Energy »
Weiyang Liu · Rongmei Lin · Zhen Liu · Lixin Liu · Zhiding Yu · Bo Dai · Le Song -
2017 : Poster session + Coffee break »
Mikael Kågebäck · Igor Melnyk · Amir-Hossein Karimi · Gino Brunner · Ershad Banijamali · Chris Donahue · Jake Zhao · Giambattista Parascandolo · Valentin Thomas · Abhishek Kumar · Chris Burgess · Amanda Nilsson · Maria Larsson · Cian Eastwood · Momchil Peychev -
2017 Poster: Bridging the Gap Between Value and Policy Based Reinforcement Learning »
Ofir Nachum · Mohammad Norouzi · Kelvin Xu · Dale Schuurmans -
2017 Poster: Multi-view Matrix Factorization for Linear Dynamical System Estimation »
Mahdi Karami · Martha White · Dale Schuurmans · Csaba Szepesvari -
2017 Poster: Deep Hyperspherical Learning »
Weiyang Liu · Yan-Ming Zhang · Xingguo Li · Zhiding Yu · Bo Dai · Tuo Zhao · Le Song -
2017 Spotlight: Deep Hyperspherical Learning »
Weiyang Liu · Yan-Ming Zhang · Xingguo Li · Zhiding Yu · Bo Dai · Tuo Zhao · Le Song -
2016 Poster: Deep Learning Games »
Dale Schuurmans · Martin A Zinkevich -
2016 Poster: Reward Augmented Maximum Likelihood for Neural Structured Prediction »
Mohammad Norouzi · Samy Bengio · zhifeng Chen · Navdeep Jaitly · Mike Schuster · Yonghui Wu · Dale Schuurmans -
2015 Poster: Embedding Inference for Structured Multilabel Prediction »
Farzaneh Mirzazadeh · Siamak Ravanbakhsh · Nan Ding · Dale Schuurmans -
2014 Workshop: Representation and Learning Methods for Complex Outputs »
Richard Zemel · Dale Schuurmans · Kilian Q Weinberger · Yuhong Guo · Jia Deng · Francesco Dinuzzo · Hal Daumé III · Honglak Lee · Noah A Smith · Richard Sutton · Jiaqian YU · Vitaly Kuznetsov · Luke Vilnis · Hanchen Xiong · Calvin Murdock · Thomas Unterthiner · Jean-Francis Roy · Martin Renqiang Min · Hichem SAHBI · Fabio Massimo Zanzotto -
2014 Poster: Convex Deep Learning via Normalized Kernels »
Özlem Aslan · Xinhua Zhang · Dale Schuurmans -
2014 Poster: Scalable Kernel Methods via Doubly Stochastic Gradients »
Bo Dai · Bo Xie · Niao He · Yingyu Liang · Anant Raj · Maria-Florina F Balcan · Le Song -
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser -
2013 Poster: Robust Low Rank Kernel Embeddings of Multivariate Distributions »
Le Song · Bo Dai -
2013 Poster: Convex Two-Layer Modeling »
Özlem Aslan · Hao Cheng · Xinhua Zhang · Dale Schuurmans -
2013 Spotlight: Convex Two-Layer Modeling »
Özlem Aslan · Hao Cheng · Xinhua Zhang · Dale Schuurmans -
2013 Poster: Polar Operators for Structured Sparse Estimation »
Xinhua Zhang · Yao-Liang Yu · Dale Schuurmans -
2012 Poster: Convex Multi-view Subspace Learning »
Martha White · Yao-Liang Yu · Xinhua Zhang · Dale Schuurmans -
2012 Poster: Accelerated Training for Matrix-norm Regularization: A Boosting Approach »
Xinhua Zhang · Yao-Liang Yu · Dale Schuurmans -
2012 Poster: A Polynomial-time Form of Robust Regression »
Yao-Liang Yu · Özlem Aslan · Dale Schuurmans -
2010 Poster: Relaxed Clipping: A Global Training Method for Robust Regression and Classification »
Yao-Liang Yu · Min Yang · Linli Xu · Martha White · Dale Schuurmans -
2009 Poster: Convex Relaxation of Mixture Regression with Efficient Algorithms »
Novi Quadrianto · Tiberio Caetano · John Lim · Dale Schuurmans -
2009 Poster: A General Projection Property for Distribution Families »
Yao-Liang Yu · Yuxi Li · Dale Schuurmans · Csaba Szepesvari -
2007 Spotlight: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Poster: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Session: Spotlights »
Dale Schuurmans -
2007 Poster: Convex Relaxations of EM »
Yuhong Guo · Dale Schuurmans -
2007 Poster: Discriminative Batch Mode Active Learning »
Yuhong Guo · Dale Schuurmans -
2006 Poster: Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields »
Chi-Hoon Lee · Shaojun Wang · Feng Jiao · Dale Schuurmans · Russell Greiner -
2006 Poster: implicit Online Learning with Kernels »
Li Cheng · Vishwanathan S V N · Dale Schuurmans · Shaojun Wang · Terry Caelli