Timezone: »
Poster
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares
Rong Ge · Sham Kakade · Rahul Kidambi · Praneeth Netrapalli
Thu Dec 12 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #210
Minimax optimal convergence rates for numerous classes of stochastic convex optimization problems are well characterized, where the majority of results utilize iterate averaged stochastic gradient descent (SGD) with polynomially decaying step sizes. In contrast, the behavior of SGD’s final iterate has received much less attention despite the widespread use in practice. Motivated by this observation, this work provides a detailed study of the following question: what rate is achievable using the final iterate of SGD for the streaming least squares regression problem with and without strong convexity?
First, this work shows that even if the time horizon T (i.e. the number of iterations that SGD is run for) is known in advance, the behavior of SGD’s final iterate with any polynomially decaying learning rate scheme is highly sub-optimal compared to the statistical minimax rate (by a condition number factor in the strongly convex case and a factor of $\sqrt{T}$ in the non-strongly convex case). In contrast, this paper shows that Step Decay schedules, which cut the learning rate by a constant factor every constant number of epochs (i.e., the learning rate decays geometrically) offer significant improvements over any polynomially decaying step size schedule. In particular, the behavior of the final iterate with step decay schedules is off from the statistical minimax rate by only log factors (in the condition number for the strongly convex case, and in T in the non-strongly convex case). Finally, in stark contrast to the known horizon case, this paper shows that the anytime (i.e. the limiting) behavior of SGD’s final iterate is poor (in that it queries iterates with highly sub-optimal function value infinitely often, i.e. in a limsup sense) irrespective of the step size scheme employed. These results demonstrate the subtlety in establishing optimal learning rate schedules (for the final iterate) for stochastic gradient procedures in fixed time horizon settings.
Author Information
Rong Ge (Duke University)
Sham Kakade (University of Washington)
Rahul Kidambi (Cornell University)
Praneeth Netrapalli (Microsoft Research)
More from the Same Authors
-
2022 Spotlight: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Sham Kakade -
2022 Poster: Outlier-Robust Sparse Estimation via Non-Convex Optimization »
Yu Cheng · Ilias Diakonikolas · Rong Ge · Shivam Gupta · Daniel Kane · Mahdi Soltanolkotabi -
2021 : Invited Speaker Panel »
Sham Kakade · Minmin Chen · Philip Thomas · Angela Schoellig · Barbara Engelhardt · Doina Precup · George Tucker -
2021 : Q&A for Sham Kakade »
Sham Kakade -
2021 : Generalization theory in Offline RL »
Sham Kakade -
2021 Poster: The Benefits of Implicit Regularization from SGD in Least Squares Problems »
Difan Zou · Jingfeng Wu · Vladimir Braverman · Quanquan Gu · Dean Foster · Sham Kakade -
2021 Poster: Robust and differentially private mean estimation »
Xiyang Liu · Weihao Kong · Sham Kakade · Sewoong Oh -
2021 Poster: Understanding Deflation Process in Over-parametrized Tensor Decomposition »
Rong Ge · Yunwei Ren · Xiang Wang · Mo Zhou -
2021 Poster: A Regression Approach to Learning-Augmented Online Algorithms »
Keerti Anand · Rong Ge · Amit Kumar · Debmalya Panigrahi -
2021 Poster: An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap »
Yuanhao Wang · Ruosong Wang · Sham Kakade -
2021 Poster: Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage »
Jonathan Chang · Masatoshi Uehara · Dhruv Sreenivas · Rahul Kidambi · Wen Sun -
2021 Poster: Going Beyond Linear RL: Sample Efficient Neural Function Approximation »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Poster: LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes »
Aditya Kusupati · Matthew Wallingford · Vivek Ramanujan · Raghav Somani · Jae Sung Park · Krishna Pillutla · Prateek Jain · Sham Kakade · Ali Farhadi -
2021 Poster: Gone Fishing: Neural Active Learning with Fisher Embeddings »
Jordan Ash · Surbhi Goel · Akshay Krishnamurthy · Sham Kakade -
2021 Poster: MobILE: Model-Based Imitation Learning From Observation Alone »
Rahul Kidambi · Jonathan Chang · Wen Sun -
2021 Poster: Optimal Gradient-based Algorithms for Non-concave Bandit Optimization »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Oral: An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap »
Yuanhao Wang · Ruosong Wang · Sham Kakade -
2020 Poster: Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method »
Kiran Thekumparampil · Prateek Jain · Praneeth Netrapalli · Sewoong Oh -
2020 Spotlight: Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method »
Kiran Thekumparampil · Prateek Jain · Praneeth Netrapalli · Sewoong Oh -
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning Q&A »
Sham M Kakade · Martha White · Nicolas Le Roux -
2020 Poster: Robust Meta-learning for Mixed Linear Regression with Small Batches »
Weihao Kong · Raghav Somani · Sham Kakade · Sewoong Oh -
2020 Poster: Beyond Lazy Training for Over-parameterized Tensor Decomposition »
Xiang Wang · Chenwei Wu · Jason Lee · Tengyu Ma · Rong Ge -
2020 Poster: The Pitfalls of Simplicity Bias in Neural Networks »
Harshay Shah · Kaustav Tamuly · Aditi Raghunathan · Prateek Jain · Praneeth Netrapalli -
2020 Poster: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms »
Dheeraj Nagaraj · Xian Wu · Guy Bresler · Prateek Jain · Praneeth Netrapalli -
2020 Poster: Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games »
Arun Suggala · Praneeth Netrapalli -
2020 Spotlight: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms »
Dheeraj Nagaraj · Xian Wu · Guy Bresler · Prateek Jain · Praneeth Netrapalli -
2020 Poster: Is Long Horizon RL More Difficult Than Short Horizon RL? »
Ruosong Wang · Simon Du · Lin Yang · Sham Kakade -
2020 Poster: FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs »
Alekh Agarwal · Sham Kakade · Akshay Krishnamurthy · Wen Sun -
2020 Poster: PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning »
Alekh Agarwal · Mikael Henaff · Sham Kakade · Wen Sun -
2020 Poster: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Spotlight: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs »
Chi Jin · Sham Kakade · Akshay Krishnamurthy · Qinghua Liu -
2020 Oral: FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs »
Alekh Agarwal · Sham Kakade · Akshay Krishnamurthy · Wen Sun -
2020 Poster: MOReL: Model-Based Offline Reinforcement Learning »
Rahul Kidambi · Aravind Rajeswaran · Praneeth Netrapalli · Thorsten Joachims -
2020 Poster: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Poster: Information Theoretic Regret Bounds for Online Nonlinear Control »
Sham Kakade · Akshay Krishnamurthy · Kendall Lowrey · Motoya Ohnishi · Wen Sun -
2020 Spotlight: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning »
Sham M Kakade · Martha White · Nicolas Le Roux -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 : Contributed talk: What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? »
Praneeth Netrapalli -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : The Provable Effectiveness of Policy Gradient Methods in Reinforcement Learning »
Sham Kakade -
2019 Poster: Efficient Algorithms for Smooth Minimax Optimization »
Kiran Thekumparampil · Prateek Jain · Praneeth Netrapalli · Sewoong Oh -
2019 Poster: Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets »
Rohith Kuditipudi · Xiang Wang · Holden Lee · Yi Zhang · Zhiyuan Li · Wei Hu · Rong Ge · Sanjeev Arora -
2019 Poster: Meta-Learning with Implicit Gradients »
Aravind Rajeswaran · Chelsea Finn · Sham Kakade · Sergey Levine -
2018 Poster: A Smoother Way to Train Structured Prediction Models »
Krishna Pillutla · Vincent Roulet · Sham Kakade · Zaid Harchaoui -
2018 Poster: Support Recovery for Orthogonal Matching Pursuit: Upper and Lower bounds »
Raghav Somani · Chirag Gupta · Prateek Jain · Praneeth Netrapalli -
2018 Spotlight: Support Recovery for Orthogonal Matching Pursuit: Upper and Lower bounds »
Raghav Somani · Chirag Gupta · Prateek Jain · Praneeth Netrapalli -
2018 Poster: On the Local Minima of the Empirical Risk »
Chi Jin · Lydia T. Liu · Rong Ge · Michael Jordan -
2018 Spotlight: On the Local Minima of the Empirical Risk »
Chi Jin · Lydia T. Liu · Rong Ge · Michael Jordan -
2018 Poster: Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo »
Holden Lee · Andrej Risteski · Rong Ge -
2018 Poster: Provably Correct Automatic Sub-Differentiation for Qualified Programs »
Sham Kakade · Jason Lee -
2017 Poster: Learning Overcomplete HMMs »
Vatsal Sharan · Sham Kakade · Percy Liang · Gregory Valiant -
2017 Poster: On the Optimization Landscape of Tensor Decompositions »
Rong Ge · Tengyu Ma -
2017 Oral: On the Optimization Landscape of Tensor Decompositions »
Rong Ge · Tengyu Ma -
2017 Poster: Towards Generalization and Simplicity in Continuous Control »
Aravind Rajeswaran · Kendall Lowrey · Emanuel Todorov · Sham Kakade -
2016 Poster: Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent »
Chi Jin · Sham Kakade · Praneeth Netrapalli -
2015 Poster: Convergence Rates of Active Learning for Maximum Likelihood Estimation »
Kamalika Chaudhuri · Sham Kakade · Praneeth Netrapalli · Sujay Sanghavi -
2015 Poster: Submodular Hamming Metrics »
Jennifer Gillenwater · Rishabh K Iyer · Bethany Lusch · Rahul Kidambi · Jeffrey A Bilmes -
2015 Poster: Super-Resolution Off the Grid »
Qingqing Huang · Sham Kakade -
2015 Spotlight: Super-Resolution Off the Grid »
Qingqing Huang · Sham Kakade -
2015 Spotlight: Submodular Hamming Metrics »
Jennifer Gillenwater · Rishabh K Iyer · Bethany Lusch · Rahul Kidambi · Jeffrey A Bilmes -
2013 Poster: When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity »
Anima Anandkumar · Daniel Hsu · Majid Janzamin · Sham M Kakade -
2012 Poster: Learning Mixtures of Tree Graphical Models »
Anima Anandkumar · Daniel Hsu · Furong Huang · Sham M Kakade -
2012 Poster: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2012 Poster: Identifiability and Unmixing of Latent Parse Trees »
Percy Liang · Sham M Kakade · Daniel Hsu -
2012 Spotlight: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · Yi-Kai Liu -
2011 Poster: Stochastic convex optimization with bandit feedback »
Alekh Agarwal · Dean P Foster · Daniel Hsu · Sham M Kakade · Sasha Rakhlin -
2011 Poster: Spectral Methods for Learning Multivariate Latent Tree Structure »
Anima Anandkumar · Kamalika Chaudhuri · Daniel Hsu · Sham M Kakade · Le Song · Tong Zhang -
2011 Poster: Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression »
Sham M Kakade · Adam Kalai · Varun Kanade · Ohad Shamir -
2010 Spotlight: Learning from Logged Implicit Exploration Data »
Alex Strehl · Lihong Li · John Langford · Sham M Kakade -
2010 Poster: Learning from Logged Implicit Exploration Data »
Alexander L Strehl · John Langford · Lihong Li · Sham M Kakade -
2009 Poster: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2009 Oral: Multi-Label Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang -
2008 Poster: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai Shalev-Shwartz · Sham M Kakade -
2008 Poster: On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization »
Sham M Kakade · Karthik Sridharan · Ambuj Tewari -
2007 Oral: The Price of Bandit Information for Online Optimization »
Varsha Dani · Thomas P Hayes · Sham M Kakade -
2007 Poster: The Price of Bandit Information for Online Optimization »
Varsha Dani · Thomas P Hayes · Sham M Kakade