Timezone: »
Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling- without-replacement variant of Stochastic Gradient Descent (SGD), known as Random Reshuffling (RR). Unlike SGD that samples data with replacement at every iteration, RR chooses a random permutation of data at the beginning of each epoch. For under-parameterized models, it has been recently shown that RR converges faster than SGD when the number of epochs is larger than the condition number (κ) of the problem under standard assumptions like strong convexity. However, previous works do not show that RR outperforms SGD under interpolation for strongly convex objectives. Here, we show that for the class of Polyak-Łojasiewicz (PL) functions that generalizes strong convexity, RR can outperform SGD as long as the number of samples (n) is less than the parameter (ρ) of a strong growth condition (SGC).
Author Information
Chen Fan (University of British Columbia)
Christos Thrampoulidis (University of British Columbia)
Mark Schmidt (University of British Columbia)
More from the Same Authors
-
2021 : Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers »
Jacques Chen · Frederik Kunstner · Mark Schmidt -
2021 : Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers »
Jacques Chen · Frederik Kunstner · Mark Schmidt -
2021 : Faster Quasi-Newton Methods for Linear Composition Problems »
Betty Shea · Mark Schmidt -
2021 : A Closer Look at Gradient Estimators with Reinforcement Learning as Inference »
Jonathan Lavington · Michael Teng · Mark Schmidt · Frank Wood -
2021 : An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control »
Nicholas Ioannidis · Jonathan Lavington · Mark Schmidt -
2022 : On the Implicit Geometry of Cross-Entropy Parameterizations for Label-Imbalanced Data »
Tina Behnia · Ganesh Ramachandra Kini · Vala Vakilian · Christos Thrampoulidis -
2022 : Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad Harikandeh · Mark Schmidt · Nicolas Le Roux -
2022 : Fast Convergence of Greedy 2-Coordinate Updates for Optimizing with an Equality Constraint »
Amrutha Varshini Ramesh · Aaron Mishkin · Mark Schmidt -
2022 : Generalization of Decentralized Gradient Descent with Separable Data »
Hossein Taheri · Christos Thrampoulidis -
2022 : Practical Structured Riemannian Optimization with Momentum by using Generalized Normal Coordinates »
Wu Lin · Valentin Duruisseaux · Melvin Leok · Frank Nielsen · Mohammad Emtiyaz Khan · Mark Schmidt -
2023 Poster: Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking »
Frederik Kunstner · Victor Sanches Portella · Mark Schmidt · Nicholas Harvey -
2023 Poster: Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models »
Leonardo Galli · Holger Rauhut · Mark Schmidt -
2023 Poster: BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization »
Chen Fan · Gaspard Choné-Ducasse · Mark Schmidt · Christos Thrampoulidis -
2022 : Poster Session 2 »
Jinwuk Seok · Bo Liu · Ryotaro Mitsuboshi · David Martinez-Rubio · Weiqiang Zheng · Ilgee Hong · Chen Fan · Kazusato Oko · Bo Tang · Miao Cheng · Aaron Defazio · Tim G. J. Rudner · Gabriele Farina · Vishwak Srinivasan · Ruichen Jiang · Peng Wang · Jane Lee · Nathan Wycoff · Nikhil Ghosh · Yinbin Han · David Mueller · Liu Yang · Amrutha Varshini Ramesh · Siqi Zhang · Kaifeng Lyu · David Yunis · Kumar Kshitij Patel · Fangshuo Liao · Dmitrii Avdiukhin · Xiang Li · Sattar Vakili · Jiaxin Shi -
2022 Poster: Imbalance Trouble: Revisiting Neural-Collapse Geometry »
Christos Thrampoulidis · Ganesh Ramachandra Kini · Vala Vakilian · Tina Behnia -
2022 Poster: Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently »
Haoyuan Sun · Kwangjun Ahn · Christos Thrampoulidis · Navid Azizan -
2021 Poster: AutoBalance: Optimized Loss Functions for Imbalanced Data »
Mingchen Li · Xuechen Zhang · Christos Thrampoulidis · Jiasi Chen · Samet Oymak -
2021 Poster: UCB-based Algorithms for Multinomial Logistic Regression Bandits »
Sanae Amani · Christos Thrampoulidis -
2021 Poster: Label-Imbalanced and Group-Sensitive Classification under Overparameterization »
Ganesh Ramachandra Kini · Orestis Paraskevas · Samet Oymak · Christos Thrampoulidis -
2021 Poster: Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation »
Ke Wang · Vidya Muthukumar · Christos Thrampoulidis -
2020 : Closing remarks »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 : Live Q&A with Michael Friedlander (Zoom) »
Mark Schmidt -
2020 : Intro to Invited Speaker 8 »
Mark Schmidt -
2020 : Contributed talks in Session 3 (Zoom) »
Mark Schmidt · Zhan Gao · Wenjie Li · Preetum Nakkiran · Denny Wu · Chengrun Yang -
2020 : Live Q&A with Rachel Ward (Zoom) »
Mark Schmidt -
2020 : Live Q&A with Ashia Wilson (Zoom) »
Mark Schmidt -
2020 : Welcome remarks to Session 3 »
Mark Schmidt -
2020 Workshop: OPT2020: Optimization for Machine Learning »
Courtney Paquette · Mark Schmidt · Sebastian Stich · Quanquan Gu · Martin Takac -
2020 : Welcome event (gather.town) »
Quanquan Gu · Courtney Paquette · Mark Schmidt · Sebastian Stich · Martin Takac -
2020 Poster: Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View »
Christos Thrampoulidis · Samet Oymak · Mahdi Soltanolkotabi -
2020 Poster: Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses »
Yihan Zhou · Victor Sanches Portella · Mark Schmidt · Nicholas Harvey -
2020 Poster: Stage-wise Conservative Linear Bandits »
Ahmadreza Moradipari · Christos Thrampoulidis · Mahnoosh Alizadeh -
2019 Poster: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates »
Sharan Vaswani · Aaron Mishkin · Issam Laradji · Mark Schmidt · Gauthier Gidel · Simon Lacoste-Julien -
2018 Poster: SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient »
Aaron Mishkin · Frederik Kunstner · Didrik Nielsen · Mark Schmidt · Mohammad Emtiyaz Khan -
2016 : Fast Patch-based Style Transfer of Arbitrary Style »
Tian Qi Chen · Mark Schmidt -
2015 Poster: StopWasting My Gradients: Practical SVRG »
Reza Babanezhad Harikandeh · Mohamed Osama Ahmed · Alim Virani · Mark Schmidt · Jakub Konečný · Scott Sallinen