Timezone: »

Finite Versus Infinite Neural Networks: an Empirical Study
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha Sohl-Dickstein

Thu Dec 10 07:50 PM -- 08:00 PM (PST) @ Orals & Spotlights: Deep Learning

We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating point precision limits kernel performance beyond a critical dataset size; regularized ZCA whitening improves accuracy; finite network performance depends non-monotonically on width in ways not captured by double descent phenomena; equivariance of CNNs is only beneficial for narrow networks far from the kernel regime. Our experiments additionally motivate an improved layer-wise scaling for weight decay which improves generalization in finite-width networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction, including a novel ensembling technique. Using these best practices we achieve state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class we consider.

Author Information

Jaehoon Lee (Google Brain)
Sam Schoenholz (Google Brain)
Jeffrey Pennington (Google Brain)
Ben Adlam (Google)
Lechao Xiao (Google Brain)

Lechao is an AI resident on the Brain team at Google, where he is working on machine learning and deep learning. Prior to Google Brain, he was a Hans Rademacher Instructor of Mathematics at the University of Pennsylvania, where he was working on harmonic analysis. He earned his PhD in mathematics from the University of Illinois at Urbana-Champaign and his BA in pure and applied math from Zhejiang University, Hangzhou, China. Lechao research interests include theory of machine learning and deep learning, optimization, Gaussian process, generalization, etc. He is particularly interested in research problems that has a good combination of theory and practice. He developed (with his coauthor) a mean field theory for convolutional neural networks. He developed several novel initialization methods (orthogonal convolutional kernel and delta orthogonal kernel) which allow practitioners to train neural networks with more than 10,000 layers without the use of any common techniques.

Roman Novak (Google Brain)
Jascha Sohl-Dickstein (Google Brain)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors