Timezone: »
A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.
Author Information
Jaehoon Lee (Google Brain)
Lechao Xiao (Google Brain)
Samuel Schoenholz (Google Brain)
Yasaman Bahri (Google Brain)
Roman Novak (Google Brain)
Jascha Sohl-Dickstein (Google Brain)
Jeffrey Pennington (Google Brain)
More from the Same Authors
-
2020 : End-to-End Differentiability and Tensor Processing Unit Computing to Accelerate Materials’ Inverse Design »
HAN LIU · Yuhan Liu · Zhangji Zhao · Samuel Schoenholz · Ekin Dogus Cubuk · Mathieu Bauchy -
2021 : Fast Finite Width Neural Tangent Kernel »
Roman Novak · Jascha Sohl-Dickstein · Samuel Schoenholz -
2022 : A Second-order Regression Model Shows Edge of Stability Behavior »
Fabian Pedregosa · Atish Agarwala · Jeffrey Pennington -
2022 Poster: Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions »
Courtney Paquette · Elliot Paquette · Ben Adlam · Jeffrey Pennington -
2022 Poster: Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression »
Lechao Xiao · Hong Hu · Theodor Misiakiewicz · Yue Lu · Jeffrey Pennington -
2022 Poster: Fast Neural Kernel Embeddings for General Activations »
Insu Han · Amir Zandieh · Jaehoon Lee · Roman Novak · Lechao Xiao · Amin Karbasi -
2021 Poster: Dataset Distillation with Infinitely Wide Convolutional Networks »
Timothy Nguyen · Roman Novak · Lechao Xiao · Jaehoon Lee -
2021 Poster: Overparameterization Improves Robustness to Covariate Shift in High Dimensions »
Nilesh Tripuraneni · Ben Adlam · Jeffrey Pennington -
2021 Poster: Reverse engineering learned optimizers reveals known and novel mechanisms »
Niru Maheswaranathan · David Sussillo · Luke Metz · Ruoxi Sun · Jascha Sohl-Dickstein -
2020 : Reverse engineering learned optimizers reveals known and novel mechanisms »
Niru Maheswaranathan · David Sussillo · Luke Metz · Ruoxi Sun · Jascha Sohl-Dickstein -
2020 Poster: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha Sohl-Dickstein -
2020 Spotlight: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha Sohl-Dickstein -
2020 Poster: Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling »
Tong Che · Ruixiang ZHANG · Jascha Sohl-Dickstein · Hugo Larochelle · Liam Paull · Yuan Cao · Yoshua Bengio -
2020 Poster: The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington -
2020 Spotlight: The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks »
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington -
2020 Poster: JAX MD: A Framework for Differentiable Physics »
Samuel Schoenholz · Ekin Dogus Cubuk -
2020 Spotlight: JAX MD: A Framework for Differentiable Physics »
Samuel Schoenholz · Ekin Dogus Cubuk -
2020 Poster: Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition »
Ben Adlam · Jeffrey Pennington -
2019 : Towards an understanding of wide, deep neural networks »
Yasaman Bahri -
2019 : Afternoon Coffee Break & Poster Session »
Heidi Komkov · Stanislav Fort · Zhaoyou Wang · Rose Yu · Ji Hwan Park · Samuel Schoenholz · Taoli Cheng · Ryan-Rhys Griffiths · Chase Shimmin · Surya Karthik Mukkavili · Philippe Schwaller · Christian Knoll · Yangzesheng Sun · Keiichi Kisamori · Gavin Graham · Gavin Portwood · Hsin-Yuan Huang · Paul Novello · Moritz Munchmeyer · Anna Jungbluth · Daniel Levine · Ibrahim Ayed · Steven Atkinson · Jan Hermann · Peter Grönquist · · Priyabrata Saha · Yannik Glaser · Lingge Li · Yutaro Iiyama · Rushil Anirudh · Maciej Koch-Janusz · Vikram Sundar · Francois Lanusse · Auralee Edelen · Jonas Köhler · Jacky H. T. Yip · jiadong guo · Xiangyang Ju · Adi Hanuka · Adrian Albert · Valentina Salvatelli · Mauro Verzetti · Javier Duarte · Eric Moreno · Emmanuel de Bézenac · Athanasios Vlontzos · Alok Singh · Thomas Klijnsma · Brad Neuberg · Paul Wright · Mustafa Mustafa · David Schmidt · Steven Farrell · Hao Sun -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 : JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python »
Samuel Schoenholz -
2019 : Surya Ganguli, Yasaman Bahri, Florent Krzakala moderated by Lenka Zdeborova »
Florent Krzakala · Yasaman Bahri · Surya Ganguli · Lenka Zdeborová · Adji Bousso Dieng · Joan Bruna -
2019 : Yasaman Bahri - Tractable limits for deep networks: an overview of the large width regime »
Yasaman Bahri -
2019 : Poster Session »
Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie -
2019 : Neural Reparameterization Improves Structural Optimization »
Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus -
2019 Poster: MetaInit: Initializing learning by learning to initialize »
Yann Dauphin · Samuel Schoenholz -
2019 Poster: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2019 Spotlight: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2018 : Poster Session 1 »
Stefan Gadatsch · Danil Kuzin · Navneet Kumar · Patrick Dallaire · Tom Ryder · Remus-Petru Pop · Nathan Hunt · Adam Kortylewski · Sophie Burkhardt · Mahmoud Elnaggar · Dieterich Lawson · Yifeng Li · Jongha (Jon) Ryu · Juhan Bae · Micha Livne · Tim Pearce · Mariia Vladimirova · Jason Ramapuram · Jiaming Zeng · Xinyu Hu · Jiawei He · Danielle Maddix · Arunesh Mittal · Albert Shaw · Tuan Anh Le · Alexander Sagel · Lisha Chen · Victor Gallego · Mahdi Karami · Zihao Zhang · Tal Kachman · Noah Weber · Matt Benatan · Kumar K Sricharan · Vincent Cartillier · Ivan Ovinnikov · Buu Phan · Mahmoud Hossam · Liu Ziyin · Valerii Kharitonov · Eugene Golikov · Qiang Zhang · Jae Myung Kim · Sebastian Farquhar · Jishnu Mukhoti · Xu Hu · Gregory Gundersen · Lavanya Sita Tekumalla · Paris Perdikaris · Ershad Banijamali · Siddhartha Jain · Ge Liu · Martin Gottwald · Katy Blumer · Sukmin Yun · Ranganath Krishnan · Roman Novak · Yilun Du · Yu Gong · Beliz Gokkaya · Jessica Ai · Daniel Duckworth · Johannes von Oswald · Christian Henning · Louis-Philippe Morency · Ali Ghodsi · Mahesh Subedar · Jean-Pascal Pfister · Rémi Lebret · Chao Ma · Aleksander Wieczorek · Laurence Perreault Levasseur -
2018 Poster: The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network »
Jeffrey Pennington · Pratik Worah -
2018 Poster: PCA of high dimensional random walks with comparison to neural network training »
Joseph Antognini · Jascha Sohl-Dickstein -
2018 Poster: Adversarial Examples that Fool both Computer Vision and Time-Limited Humans »
Gamaleldin Elsayed · Shreya Shankar · Brian Cheung · Nicolas Papernot · Alexey Kurakin · Ian Goodfellow · Jascha Sohl-Dickstein -
2017 Spotlight: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah -
2017 Poster: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha Sohl-Dickstein -
2017 Poster: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah -
2017 Oral: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models »
George Tucker · Andriy Mnih · Chris J Maddison · John Lawson · Jascha Sohl-Dickstein -
2017 Poster: Mean Field Residual Networks: On the Edge of Chaos »
Ge Yang · Samuel Schoenholz -
2017 Poster: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice »
Jeffrey Pennington · Samuel Schoenholz · Surya Ganguli -
2017 Poster: SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability »
Maithra Raghu · Justin Gilmer · Jason Yosinski · Jascha Sohl-Dickstein -
2016 : From Brains to Bits and Back Again »
Yoshua Bengio · Terrence Sejnowski · Christos H Papadimitriou · Jakob H Macke · Demis Hassabis · Alyson Fletcher · Andreas Tolias · Jascha Sohl-Dickstein · Konrad P Koerding -
2016 : Opening Remarks »
Jascha Sohl-Dickstein -
2016 Workshop: Brains and Bits: Neuroscience meets Machine Learning »
Alyson Fletcher · Eva Dyer · Jascha Sohl-Dickstein · Joshua T Vogelstein · Konrad Koerding · Jakob H Macke -
2016 Poster: Exponential expressivity in deep neural networks through transient chaos »
Ben Poole · Subhaneil Lahiri · Maithra Raghu · Jascha Sohl-Dickstein · Surya Ganguli -
2015 Workshop: Statistical Methods for Understanding Neural Systems »
Alyson Fletcher · Jakob H Macke · Ryan Adams · Jascha Sohl-Dickstein -
2015 Poster: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar -
2015 Spotlight: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar -
2015 Poster: Deep Knowledge Tracing »
Chris Piech · Jonathan Bassen · Jonathan Huang · Surya Ganguli · Mehran Sahami · Leonidas Guibas · Jascha Sohl-Dickstein -
2012 Poster: Training sparse natural image models with a fast Gibbs sampler of an extended state space »
Lucas Theis · Jascha Sohl-Dickstein · Matthias Bethge