Timezone: »
Poster
Label Noise SGD Provably Prefers Flat Global Minimizers
Alex Damian · Tengyu Ma · Jason Lee
In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to a stationary point of a regularized loss $L(\theta) +\lambda R(\theta)$, where $L(\theta)$ is the training loss, $\lambda$ is an effective regularization parameter depending on the step size, strength of the label noise, and the batch size, and $R(\theta)$ is an explicit regularizer that penalizes sharp minimizers. Our analysis uncovers an additional regularization effect of large learning rates beyond the linear scaling rule that penalizes large eigenvalues of the Hessian more than small ones. We also prove extensions to classification with general loss functions, significantly strengthening the prior work of Blanc et al. to global convergence and large learning rates and of HaoChen et al. to general models.
Author Information
Alex Damian (Princeton University)
Tengyu Ma (Stanford University)
Jason Lee (University of Southern California)
More from the Same Authors
-
2021 Spotlight: Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning »
Colin Wei · Sang Michael Xie · Tengyu Ma -
2021 : Calibrated Ensembles: A Simple Way to Mitigate ID-OOD Accuracy Tradeoffs »
Ananya Kumar · Aditi Raghunathan · Tengyu Ma · Percy Liang -
2021 : Self-supervised Learning is More Robust to Dataset Imbalance »
Hong Liu · Jeff Z. HaoChen · Adrien Gaidon · Tengyu Ma -
2021 : Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2022 : Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability »
Alex Damian · Eshaan Nichani · Jason Lee -
2023 : Teaching Arithmetic to Small Transformers »
Nayoung Lee · Kartik Sreenivasan · Jason Lee · Kangwook Lee · Dimitris Papailiopoulos -
2023 : Towards Optimal Statistical Watermarking »
Baihe Huang · Banghua Zhu · Hanlin Zhu · Jason Lee · Jiantao Jiao · Michael Jordan -
2023 : Provably Efficient CVaR RL in Low-rank MDPs »
Yulai Zhao · Wenhao Zhan · Xiaoyan Hu · Ho-fung Leung · Farzan Farnia · Wen Sun · Jason Lee -
2023 Workshop: Mathematics of Modern Machine Learning (M3L) »
Zhiyuan Li · Tengyu Ma · Surbhi Goel · Kaifeng Lyu · Christina Baek · Bingbin Liu · Alex Damian · Aditi Raghunathan -
2023 Poster: Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms »
Qian Yu · Yining Wang · Baihe Huang · Qi Lei · Jason Lee -
2023 Poster: Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage »
Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 Poster: Fine-Tuning Language Models with Just Forward Passes »
Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Alex Damian · Jason Lee · Danqi Chen · Sanjeev Arora -
2023 Poster: Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models »
Alex Damian · Eshaan Nichani · Rong Ge · Jason Lee -
2023 Oral: Fine-Tuning Language Models with Just Forward Passes »
Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Alex Damian · Jason Lee · Danqi Chen · Sanjeev Arora -
2023 Oral: Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models »
Alex Damian · Eshaan Nichani · Rong Ge · Jason Lee -
2023 Poster: Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning »
Gen Li · Wenhao Zhan · Jason Lee · Yuejie Chi · Yuxin Chen -
2023 Poster: Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks »
Eshaan Nichani · Alex Damian · Jason Lee -
2023 Poster: Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability »
Jingfeng Wu · Vladimir Braverman · Jason Lee -
2022 Poster: Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials »
Eshaan Nichani · Yu Bai · Jason Lee -
2022 Poster: Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2022 Poster: Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent »
Zhiyuan Li · Tianhao Wang · Jason Lee · Sanjeev Arora -
2022 Poster: On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias »
Itay Safran · Gal Vardi · Jason Lee -
2022 Poster: From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent »
Christopher De Sa · Satyen Kale · Jason Lee · Ayush Sekhari · Karthik Sridharan -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization Q&A »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 Poster: How Fine-Tuning Allows for Effective Meta-Learning »
Kurtland Chua · Qi Lei · Jason Lee -
2021 Poster: Going Beyond Linear RL: Sample Efficient Neural Function Approximation »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Poster: Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations »
Yuping Luo · Tengyu Ma -
2021 Poster: Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration »
Shengjia Zhao · Michael Kim · Roshni Sahoo · Tengyu Ma · Stefano Ermon -
2021 Poster: Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning »
Colin Wei · Sang Michael Xie · Tengyu Ma -
2021 Oral: Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss »
Jeff Z. HaoChen · Colin Wei · Adrien Gaidon · Tengyu Ma -
2021 Poster: Safe Reinforcement Learning by Imagining the Near Future »
Garrett Thomas · Yuping Luo · Tengyu Ma -
2021 Poster: Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss »
Jeff Z. HaoChen · Colin Wei · Adrien Gaidon · Tengyu Ma -
2021 Poster: Predicting What You Already Know Helps: Provable Self-Supervised Learning »
Jason Lee · Qi Lei · Nikunj Saunshi · JIACHENG ZHUO -
2021 Poster: Optimal Gradient-based Algorithms for Non-concave Bandit Optimization »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Poster: Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature »
Kefan Dong · Jiaqi Yang · Tengyu Ma -
2019 : Tengyu Ma, "Designing Explicit Regularizers for Deep Models" »
Tengyu Ma -
2018 : Poster Session »
Sujay Sanghavi · Vatsal Shah · Yanyao Shen · Tianchen Zhao · Yuandong Tian · Tomer Galanti · Mufan Li · Gilad Cohen · Daniel Rothchild · Aristide Baratin · Devansh Arpit · Vagelis Papalexakis · Michael Perlmutter · Ashok Vardhan Makkuva · Pim de Haan · Yingyan Lin · Wanmo Kang · Cheolhyoung Lee · Hao Shen · Sho Yaida · Dan Roberts · Nadav Cohen · Philippe Casgrain · Dejiao Zhang · Tengyu Ma · Avinash Ravichandran · Julian Emilio Salazar · Bo Li · Davis Liang · Christopher Wong · Glen Bigan Mbeng · Animesh Garg -
2018 : Contributed Talk 1 »
Jason Lee -
2018 Poster: Implicit Bias of Gradient Descent on Linear Convolutional Networks »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Poster: Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced »
Simon Du · Wei Hu · Jason Lee -
2018 Poster: Adding One Neuron Can Eliminate All Bad Local Minima »
SHIYU LIANG · Ruoyu Sun · Jason Lee · R. Srikant -
2018 Poster: Provably Correct Automatic Sub-Differentiation for Qualified Programs »
Sham Kakade · Jason Lee -
2018 Poster: On the Convergence and Robustness of Training GANs with Regularized Optimal Transport »
Maziar Sanjabi · Jimmy Ba · Meisam Razaviyayn · Jason Lee -
2017 Poster: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: On the Optimization Landscape of Tensor Decompositions »
Rong Ge · Tengyu Ma -
2017 Spotlight: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Oral: On the Optimization Landscape of Tensor Decompositions »
Rong Ge · Tengyu Ma -
2016 Oral: Matrix Completion has No Spurious Local Minimum »
Rong Ge · Jason Lee · Tengyu Ma -
2016 Poster: Matrix Completion has No Spurious Local Minimum »
Rong Ge · Jason Lee · Tengyu Ma -
2016 Poster: A Non-generative Framework and Convex Relaxations for Unsupervised Learning »
Elad Hazan · Tengyu Ma -
2015 Poster: Evaluating the statistical significance of biclusters »
Jason D Lee · Yuekai Sun · Jonathan E Taylor -
2015 Poster: Sum-of-Squares Lower Bounds for Sparse PCA »
Tengyu Ma · Avi Wigderson -
2014 Poster: On Communication Cost of Distributed Statistical Estimation and Dimensionality »
Ankit Garg · Tengyu Ma · Huy Nguyen -
2014 Oral: On Communication Cost of Distributed Statistical Estimation and Dimensionality »
Ankit Garg · Tengyu Ma · Huy Nguyen -
2014 Poster: Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices »
Austin Benson · Jason D Lee · Bartek Rajwa · David F Gleich -
2014 Spotlight: Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices »
Austin Benson · Jason D Lee · Bartek Rajwa · David F Gleich -
2014 Poster: Exact Post Model Selection Inference for Marginal Screening »
Jason D Lee · Jonathan E Taylor -
2013 Poster: On model selection consistency of penalized M-estimators: a geometric theory »
Jason D Lee · Yuekai Sun · Jonathan E Taylor -
2013 Poster: Using multiple samples to learn mixture models »
Jason D Lee · Ran Gilad-Bachrach · Rich Caruana -
2013 Spotlight: Using multiple samples to learn mixture models »
Jason D Lee · Ran Gilad-Bachrach · Rich Caruana -
2012 Poster: Proximal Newton-type Methods for Minimizing Convex Objective Functions in Composite Form »
Jason D Lee · Yuekai Sun · Michael Saunders -
2010 Poster: Practical Large-Scale Optimization for Max-norm Regularization »
Jason D Lee · Benjamin Recht · Russ Salakhutdinov · Nati Srebro · Joel A Tropp