Oral
in
Workshop: Mathematics of Modern Machine Learning (M3L)
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon · Lorenzo Noci · Mufan Li · Boris Hanin · Cengiz Pehlevan
[
Abstract
]
[ Project Page ]
presentation:
Mathematics of Modern Machine Learning (M3L)
Sat 16 Dec 6:50 a.m. PST — 3 p.m. PST
[
OpenReview]
Sat 16 Dec 1 p.m. PST
— 1:10 p.m. PST
Sat 16 Dec 6:50 a.m. PST — 3 p.m. PST
Abstract:
We study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $\mu$P parameterization.We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.
Chat is not available.