Timezone: »
We present and analyze several strategies for improving the performance ofstochastic variance-reduced gradient (SVRG) methods. We first show that theconvergence rate of these methods can be preserved under a decreasing sequenceof errors in the control variate, and use this to derive variants of SVRG that usegrowing-batch strategies to reduce the number of gradient calculations requiredin the early iterations. We further (i) show how to exploit support vectors to reducethe number of gradient computations in the later iterations, (ii) prove that thecommonly–used regularized SVRG iteration is justified and improves the convergencerate, (iii) consider alternate mini-batch selection strategies, and (iv) considerthe generalization error of the method.
Author Information
Reza Babanezhad Harikandeh (UBC)
Mohamed Osama Ahmed
Alim Virani
Mark Schmidt (University of British Columbia)
Jakub Konečný
Scott Sallinen (UBC)
More from the Same Authors
-
2021 : Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers »
Jacques Chen · Frederik Kunstner · Mark Schmidt -
2021 : Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers »
Jacques Chen · Frederik Kunstner · Mark Schmidt -
2021 : Faster Quasi-Newton Methods for Linear Composition Problems »
Betty Shea · Mark Schmidt -
2021 : Diurnal or Nocturnal? Federated Learning from Periodically Shifting Distributions »
Chen Zhu · Zheng Xu · Mingqing Chen · Jakub Konečný · Andrew S Hard · Tom Goldstein -
2021 : A Closer Look at Gradient Estimators with Reinforcement Learning as Inference »
Jonathan Lavington · Michael Teng · Mark Schmidt · Frank Wood -
2021 : An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control »
Nicholas Ioannidis · Jonathan Lavington · Mark Schmidt -
2022 : Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad Harikandeh · Mark Schmidt · Nicolas Le Roux -
2022 : Fast Convergence of Greedy 2-Coordinate Updates for Optimizing with an Equality Constraint »
Amrutha Varshini Ramesh · Aaron Mishkin · Mark Schmidt -
2022 : Fast Convergence of Random Reshuffling under Interpolation and the Polyak-Łojasiewicz Condition »
Chen Fan · Christos Thrampoulidis · Mark Schmidt -
2022 : Practical Structured Riemannian Optimization with Momentum by using Generalized Normal Coordinates »
Wu Lin · Valentin Duruisseaux · Melvin Leok · Frank Nielsen · Mohammad Emtiyaz Khan · Mark Schmidt -
2020 : Closing remarks »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 : Live Q&A with Michael Friedlander (Zoom) »
Mark Schmidt -
2020 : Intro to Invited Speaker 8 »
Mark Schmidt -
2020 : Contributed talks in Session 3 (Zoom) »
Mark Schmidt · Zhan Gao · Wenjie Li · Preetum Nakkiran · Denny Wu · Chengrun Yang -
2020 : Live Q&A with Rachel Ward (Zoom) »
Mark Schmidt -
2020 : Live Q&A with Ashia Wilson (Zoom) »
Mark Schmidt -
2020 : Welcome remarks to Session 3 »
Mark Schmidt -
2020 Workshop: OPT2020: Optimization for Machine Learning »
Courtney Paquette · Mark Schmidt · Sebastian Stich · Quanquan Gu · Martin Takac -
2020 : Welcome event (gather.town) »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 Poster: Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses »
Yihan Zhou · Victor Sanches Portella · Mark Schmidt · Nicholas Harvey -
2019 Poster: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates »
Sharan Vaswani · Aaron Mishkin · Issam Laradji · Mark Schmidt · Gauthier Gidel · Simon Lacoste-Julien -
2019 Poster: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2019 Spotlight: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2018 Poster: SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient »
Aaron Mishkin · Frederik Kunstner · Didrik Nielsen · Mark Schmidt · Mohammad Emtiyaz Khan -
2016 : Fast Patch-based Style Transfer of Arbitrary Style »
Tian Qi Chen · Mark Schmidt