Timezone: »
Poster
Implicit Regularization and Convergence for Weight Normalization
Xiaoxia Wu · Edgar Dobriban · Tongzheng Ren · Shanshan Wu · Zhiyuan Li · Suriya Gunasekar · Rachel Ward · Qiang Liu
Normalization methods such as batch, weight, instance, and layer normalization are commonly used in modern machine learning. Here, we study the weight normalization (WN) method \cite{salimans2016weight} and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least squares regression and some more general loss functions. WN and rPGD reparametrize the weights with a scale $g$ and a unit vector such that the objective function becomes \emph{non-convex}. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and \emph{converge linearly} close to the minimum $\ell_2$ norm solution even for initializations far from zero. For certain two-phase variants, they can converge to the min norm solution. This is different from the behavior of gradient descent, which only converges to the min norm solution when started at zero, and thus more sensitive to initialization.
Author Information
Xiaoxia (Shirley) Wu (The University of Texas at Austin)
Edgar Dobriban (University of Pennsylvania)
Tongzheng Ren (UT Austin)
Shanshan Wu (University of Texas at Austin)
Here is my homepage: http://wushanshan.github.io/
Zhiyuan Li (Princeton University)
Suriya Gunasekar (Microsoft Research Redmond)
Rachel Ward (UT Austin)
Qiang Liu (Dartmouth College)
More from the Same Authors
-
2020 Poster: Stein Self-Repulsive Dynamics: Benefits From Past Samples »
Mao Ye · Tongzheng Ren · Qiang Liu -
2020 Poster: Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate »
Zhiyuan Li · Kaifeng Lyu · Sanjeev Arora -
2020 Poster: Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform »
Jonathan Lacotte · Sifan Liu · Edgar Dobriban · Mert Pilanci -
2020 Poster: A Group-Theoretic Framework for Data Augmentation »
Shuxiao Chen · Edgar Dobriban · Jane Lee -
2020 Poster: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Spotlight: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Oral: A Group-Theoretic Framework for Data Augmentation »
Shuxiao Chen · Edgar Dobriban · Jane Lee -
2019 Poster: Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets »
Rohith Kuditipudi · Xiang Wang · Holden Lee · Yi Zhang · Zhiyuan Li · Wei Hu · Rong Ge · Sanjeev Arora -
2019 Poster: Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models »
Shanshan Wu · Sujay Sanghavi · Alexandros Dimakis -
2019 Spotlight: Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models »
Shanshan Wu · Sujay Sanghavi · Alexandros Dimakis -
2019 Poster: Learning Distributions Generated by One-Layer ReLU Networks »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi -
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2018 Poster: Online Improper Learning with an Approximation Oracle »
Elad Hazan · Wei Hu · Yuanzhi Li · Zhiyuan Li -
2018 Poster: Implicit Bias of Gradient Descent on Linear Convolutional Networks »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Poster: On preserving non-discrimination when combining expert advice »
Avrim Blum · Suriya Gunasekar · Thodoris Lykouris · Nati Srebro -
2017 Poster: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2017 Spotlight: Implicit Regularization in Matrix Factorization »
Suriya Gunasekar · Blake Woodworth · Srinadh Bhojanapalli · Behnam Neyshabur · Nati Srebro -
2016 Poster: Leveraging Sparsity for Efficient Submodular Data Summarization »
Erik Lindgren · Shanshan Wu · Alexandros Dimakis -
2016 Poster: Preference Completion from Partial Rankings »
Suriya Gunasekar · Sanmi Koyejo · Joydeep Ghosh -
2016 Poster: Single Pass PCA of Matrix Products »
Shanshan Wu · Srinadh Bhojanapalli · Sujay Sanghavi · Alexandros Dimakis -
2016 Poster: Solving Marginal MAP Problems with NP Oracles and Parity Constraints »
Yexiang Xue · zhiyuan li · Stefano Ermon · Carla Gomes · Bart Selman -
2016 Poster: Learning in Games: Robustness of Fast Convergence »
Dylan Foster · zhiyuan li · Thodoris Lykouris · Karthik Sridharan · Eva Tardos -
2015 Poster: Unified View of Matrix Completion under General Structural Constraints »
Suriya Gunasekar · Arindam Banerjee · Joydeep Ghosh