Timezone: »
We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fullyconnected finitewidth networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating point precision limits kernel performance beyond a critical dataset size; regularized ZCA whitening improves accuracy; finite network performance depends nonmonotonically on width in ways not captured by double descent phenomena; equivariance of CNNs is only beneficial for narrow networks far from the kernel regime. Our experiments additionally motivate an improved layerwise scaling for weight decay which improves generalization in finitewidth networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction, including a novel ensembling technique. Using these best practices we achieve stateoftheart results on CIFAR10 classification for kernels corresponding to each architecture class we consider.
Author Information
Jaehoon Lee (Google Brain)
Sam Schoenholz (Google Brain)
Jeffrey Pennington (Google Brain)
Ben Adlam (Google)
Lechao Xiao (Google Brain)
Lechao is an AI resident on the Brain team at Google, where he is working on machine learning and deep learning. Prior to Google Brain, he was a Hans Rademacher Instructor of Mathematics at the University of Pennsylvania, where he was working on harmonic analysis. He earned his PhD in mathematics from the University of Illinois at UrbanaChampaign and his BA in pure and applied math from Zhejiang University, Hangzhou, China. Lechao research interests include theory of machine learning and deep learning, optimization, Gaussian process, generalization, etc. He is particularly interested in research problems that has a good combination of theory and practice. He developed (with his coauthor) a mean field theory for convolutional neural networks. He developed several novel initialization methods (orthogonal convolutional kernel and delta orthogonal kernel) which allow practitioners to train neural networks with more than 10,000 layers without the use of any common techniques.
Roman Novak (Google Brain)
Jascha SohlDickstein (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)

2020 Spotlight: Finite Versus Infinite Neural Networks: an Empirical Study »
Fri Dec 11th 03:50  04:00 AM Room Orals & Spotlights: Deep Learning
More from the Same Authors

2020 Poster: Your GAN is Secretly an Energybased Model and You Should Use Discriminator Driven Latent Sampling »
Tong Che · Ruixiang ZHANG · Jascha SohlDickstein · Hugo Larochelle · Liam Paull · Yuan Cao · Yoshua Bengio 
2020 Poster: The Surprising Simplicity of the EarlyTime Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington 
2020 Spotlight: The Surprising Simplicity of the EarlyTime Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington 
2020 Poster: JAX MD: A Framework for Differentiable Physics »
Samuel Schoenholz · Ekin Dogus Cubuk 
2020 Spotlight: JAX MD: A Framework for Differentiable Physics »
Samuel Schoenholz · Ekin Dogus Cubuk 
2020 Poster: Understanding Double Descent Requires A FineGrained BiasVariance Decomposition »
Ben Adlam · Jeffrey Pennington 
2019 Poster: Learning GANs and Ensembles Using Discrepancy »
Ben Adlam · Corinna Cortes · Mehryar Mohri · Ningshan Zhang 
2019 Poster: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent »
Jaehoon Lee · Lechao Xiao · Samuel Schoenholz · Yasaman Bahri · Roman Novak · Jascha SohlDickstein · Jeffrey Pennington 
2019 Poster: MetaInit: Initializing learning by learning to initialize »
Yann Dauphin · Samuel Schoenholz 
2019 Poster: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha SohlDickstein · Laurent Dinh · Daniel Duckworth 
2019 Spotlight: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha SohlDickstein · Laurent Dinh · Daniel Duckworth 
2018 Poster: The Spectrum of the Fisher Information Matrix of a SingleHiddenLayer Neural Network »
Jeffrey Pennington · Pratik Worah 
2018 Poster: PCA of high dimensional random walks with comparison to neural network training »
Joseph Antognini · Jascha SohlDickstein 
2018 Poster: Adversarial Examples that Fool both Computer Vision and TimeLimited Humans »
Gamaleldin Elsayed · Shreya Shankar · Brian Cheung · Nicolas Papernot · Alexey Kurakin · Ian Goodfellow · Jascha SohlDickstein 
2017 Spotlight: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah 
2017 Poster: REBAR: Lowvariance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha SohlDickstein 
2017 Poster: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah 
2017 Oral: REBAR: Lowvariance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha SohlDickstein 
2017 Poster: Mean Field Residual Networks: On the Edge of Chaos »
Ge Yang · Samuel Schoenholz 
2017 Poster: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice »
Jeffrey Pennington · Samuel Schoenholz · Surya Ganguli 
2017 Poster: SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability »
Maithra Raghu · Justin Gilmer · Jason Yosinski · Jascha SohlDickstein 
2016 Workshop: Brains and Bits: Neuroscience meets Machine Learning »
Alyson Fletcher · Eva Dyer · Jascha SohlDickstein · Joshua T Vogelstein · Konrad Koerding · Jakob H Macke 
2016 Poster: Exponential expressivity in deep neural networks through transient chaos »
Ben Poole · Subhaneil Lahiri · Maithra Raghu · Jascha SohlDickstein · Surya Ganguli 
2015 Workshop: Statistical Methods for Understanding Neural Systems »
Alyson Fletcher · Jakob H Macke · Ryan Adams · Jascha SohlDickstein 
2015 Poster: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar 
2015 Spotlight: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar 
2015 Poster: Deep Knowledge Tracing »
Chris Piech · Jonathan Bassen · Jonathan Huang · Surya Ganguli · Mehran Sahami · Leonidas Guibas · Jascha SohlDickstein 
2012 Poster: Training sparse natural image models with a fast Gibbs sampler of an extended state space »
Lucas Theis · Jascha SohlDickstein · Matthias Bethge