Timezone: »

Implicit Regularization and Convergence for Weight Normalization
Xiaoxia Wu · Edgar Dobriban · Tongzheng Ren · Shanshan Wu · Zhiyuan Li · Suriya Gunasekar · Rachel Ward · Qiang Liu

Thu Dec 10 09:00 AM -- 11:00 AM (PST) @ Poster Session 5 #1555
Normalization methods such as batch, weight, instance, and layer normalization are commonly used in modern machine learning. Here, we study the weight normalization (WN) method \cite{salimans2016weight} and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least squares regression and some more general loss functions. WN and rPGD reparametrize the weights with a scale $g$ and a unit vector such that the objective function becomes \emph{non-convex}. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and \emph{converge linearly} close to the minimum $\ell_2$ norm solution even for initializations far from zero. For certain two-phase variants, they can converge to the min norm solution. This is different from the behavior of gradient descent, which only converges to the min norm solution when started at zero, and thus more sensitive to initialization.

Author Information

Xiaoxia (Shirley) Wu (The University of Texas at Austin)

Hi, I am 6-year PhD student from the department of mathematics at UT Austin. I am interested in Machine Learning and Deep Neural Network

Edgar Dobriban (University of Pennsylvania)
Tongzheng Ren (UT Austin)
Shanshan Wu (University of Texas at Austin)

Here is my homepage: http://wushanshan.github.io/

Zhiyuan Li (Princeton University)
Suriya Gunasekar (Microsoft Research Redmond)
Rachel Ward (UT Austin)
Qiang Liu (Dartmouth College)

More from the Same Authors