Timezone: »
Poster
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced
Simon Du · Wei Hu · Jason Lee
We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e. gradient descent with infinitesimal step size) effectively enforces the differences between squared norms across different layers to remain invariant without any explicit regularization. This result implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers. Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization. Inspired by our findings for gradient flow, we prove that gradient descent with step sizes $\eta_t=O(t^{−(1/2+\delta)}) (0<\delta\le1/2)$ automatically balances two low-rank factors and converges to a bounded global optimum. Furthermore, for rank-1 asymmetric matrix factorization we give a finer analysis showing gradient descent with constant step size converges to the global minimum at a globally linear rate. We believe that the idea of examining the invariance imposed by first order algorithms in learning homogeneous models could serve as a fundamental building block for studying optimization for learning deep models.
Author Information
Simon Du (Carnegie Mellon University)
Wei Hu (Princeton University)
Jason Lee (University of Southern California)
More from the Same Authors
-
2021 Spotlight: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2022 : Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability »
Alex Damian · Eshaan Nichani · Jason Lee -
2022 : Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations »
Yongyi Yang · Jacob Steinhardt · Wei Hu -
2023 : Teaching Arithmetic to Small Transformers »
Nayoung Lee · Kartik Sreenivasan · Jason Lee · Kangwook Lee · Dimitris Papailiopoulos -
2023 : Towards Optimal Statistical Watermarking »
Baihe Huang · Banghua Zhu · Hanlin Zhu · Jason Lee · Jiantao Jiao · Michael Jordan -
2023 : Provably Efficient CVaR RL in Low-rank MDPs »
Yulai Zhao · Wenhao Zhan · Xiaoyan Hu · Ho-fung Leung · Farzan Farnia · Wen Sun · Jason Lee -
2023 Poster: Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms »
Qian Yu · Yining Wang · Baihe Huang · Qi Lei · Jason Lee -
2023 Poster: Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage »
Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun -
2023 Poster: Fine-Tuning Language Models with Just Forward Passes »
Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Alex Damian · Jason Lee · Danqi Chen · Sanjeev Arora -
2023 Poster: Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models »
Alex Damian · Eshaan Nichani · Rong Ge · Jason Lee -
2023 Oral: Fine-Tuning Language Models with Just Forward Passes »
Sadhika Malladi · Tianyu Gao · Eshaan Nichani · Alex Damian · Jason Lee · Danqi Chen · Sanjeev Arora -
2023 Oral: Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models »
Alex Damian · Eshaan Nichani · Rong Ge · Jason Lee -
2023 Poster: Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning »
Gen Li · Wenhao Zhan · Jason Lee · Yuejie Chi · Yuxin Chen -
2023 Poster: Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks »
Eshaan Nichani · Alex Damian · Jason Lee -
2023 Poster: Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability »
Jingfeng Wu · Vladimir Braverman · Jason Lee -
2022 Poster: Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials »
Eshaan Nichani · Yu Bai · Jason Lee -
2022 Poster: Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2022 Poster: Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent »
Zhiyuan Li · Tianhao Wang · Jason Lee · Sanjeev Arora -
2022 Poster: On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias »
Itay Safran · Gal Vardi · Jason Lee -
2022 Poster: From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent »
Christopher De Sa · Satyen Kale · Jason Lee · Ayush Sekhari · Karthik Sridharan -
2021 Poster: How Fine-Tuning Allows for Effective Meta-Learning »
Kurtland Chua · Qi Lei · Jason Lee -
2021 Poster: Label Noise SGD Provably Prefers Flat Global Minimizers »
Alex Damian · Tengyu Ma · Jason Lee -
2021 Poster: Going Beyond Linear RL: Sample Efficient Neural Function Approximation »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Poster: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Poster: Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP »
Zihan Zhang · Jiaqi Yang · Xiangyang Ji · Simon Du -
2021 Poster: Corruption Robust Active Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2021 Poster: Nearly Horizon-Free Offline Reinforcement Learning »
Tongzheng Ren · Jialian Li · Bo Dai · Simon Du · Sujay Sanghavi -
2021 Poster: Predicting What You Already Know Helps: Provable Self-Supervised Learning »
Jason Lee · Qi Lei · Nikunj Saunshi · JIACHENG ZHUO -
2021 Poster: Optimal Gradient-based Algorithms for Non-concave Bandit Optimization »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Poster: Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization »
Tian Ye · Simon Du -
2020 Poster: The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington -
2020 Spotlight: The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington -
2019 Poster: Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets »
Rohith Kuditipudi · Xiang Wang · Holden Lee · Yi Zhang · Zhiyuan Li · Wei Hu · Rong Ge · Sanjeev Arora -
2019 Poster: Implicit Regularization in Deep Matrix Factorization »
Sanjeev Arora · Nadav Cohen · Wei Hu · Yuping Luo -
2019 Spotlight: Implicit Regularization in Deep Matrix Factorization »
Sanjeev Arora · Nadav Cohen · Wei Hu · Yuping Luo -
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2018 : Contributed Talk 1 »
Jason Lee -
2018 Poster: Online Improper Learning with an Approximation Oracle »
Elad Hazan · Wei Hu · Yuanzhi Li · Zhiyuan Li -
2018 Poster: How Many Samples are Needed to Estimate a Convolutional Neural Network? »
Simon Du · Yining Wang · Xiyu Zhai · Sivaraman Balakrishnan · Russ Salakhutdinov · Aarti Singh -
2018 Poster: Implicit Bias of Gradient Descent on Linear Convolutional Networks »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Poster: Adding One Neuron Can Eliminate All Bad Local Minima »
SHIYU LIANG · Ruoyu Sun · Jason Lee · R. Srikant -
2018 Poster: Provably Correct Automatic Sub-Differentiation for Qualified Programs »
Sham Kakade · Jason Lee -
2018 Poster: On the Convergence and Robustness of Training GANs with Regularized Optimal Transport »
Maziar Sanjabi · Jimmy Ba · Meisam Razaviyayn · Jason Lee -
2017 Poster: Hypothesis Transfer Learning via Transformation Functions »
Simon Du · Jayanth Koushik · Aarti Singh · Barnabas Poczos -
2017 Poster: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2017 Spotlight: Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls »
Zeyuan Allen-Zhu · Elad Hazan · Wei Hu · Yuanzhi Li -
2017 Spotlight: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: On the Power of Truncated SVD for General High-rank Matrix Estimation Problems »
Simon Du · Yining Wang · Aarti Singh -
2016 Oral: Matrix Completion has No Spurious Local Minimum »
Rong Ge · Jason Lee · Tengyu Ma -
2016 Poster: Combinatorial Multi-Armed Bandit with General Reward Functions »
Wei Chen · Wei Hu · Fu Li · Jian Li · Yu Liu · Pinyan Lu -
2016 Poster: Matrix Completion has No Spurious Local Minimum »
Rong Ge · Jason Lee · Tengyu Ma -
2016 Poster: Efficient Nonparametric Smoothness Estimation »
Shashank Singh · Simon Du · Barnabas Poczos -
2015 Poster: Evaluating the statistical significance of biclusters »
Jason D Lee · Yuekai Sun · Jonathan E Taylor -
2014 Poster: Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices »
Austin Benson · Jason D Lee · Bartek Rajwa · David F Gleich -
2014 Spotlight: Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices »
Austin Benson · Jason D Lee · Bartek Rajwa · David F Gleich -
2014 Poster: Exact Post Model Selection Inference for Marginal Screening »
Jason D Lee · Jonathan E Taylor -
2013 Poster: On model selection consistency of penalized M-estimators: a geometric theory »
Jason D Lee · Yuekai Sun · Jonathan E Taylor -
2013 Poster: Using multiple samples to learn mixture models »
Jason D Lee · Ran Gilad-Bachrach · Rich Caruana -
2013 Spotlight: Using multiple samples to learn mixture models »
Jason D Lee · Ran Gilad-Bachrach · Rich Caruana -
2012 Poster: Proximal Newton-type Methods for Minimizing Convex Objective Functions in Composite Form »
Jason D Lee · Yuekai Sun · Michael Saunders -
2010 Poster: Practical Large-Scale Optimization for Max-norm Regularization »
Jason D Lee · Benjamin Recht · Russ Salakhutdinov · Nati Srebro · Joel A Tropp