Timezone: »
Modern neural networks are often regarded as complex blackbox functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of wellbehaved input distributions, the earlytime learning dynamics of a twolayer fullyconnected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training.
Author Information
Wei Hu (Princeton University)
Lechao Xiao (Google Research)
Lechao is a research scientist in the Brain team in Google Research, where he is working on machine learning and deep learning. Prior to Google Brain, he was a Hans Rademacher Instructor of Mathematics at the University of Pennsylvania, where he was working on harmonic analysis. He earned his PhD in mathematics from the University of Illinois at UrbanaChampaign and his BA in pure and applied math from Zhejiang University, Hangzhou, China. Lechao research interests include theory of machine learning and deep learning, optimization, Gaussian process, generalization, etc.
Ben Adlam (Google)
Jeffrey Pennington (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)

2020 Spotlight: The Surprising Simplicity of the EarlyTime Learning Dynamics of Neural Networks »
Thu. Dec 10th 03:30  03:40 PM Room Orals & Spotlights: Deep Learning
More from the Same Authors

2022 : A Secondorder Regression Model Shows Edge of Stability Behavior »
Fabian Pedregosa · Atish Agarwala · Jeffrey Pennington 
2022 : Are Neurons Actually Collapsed? On the FineGrained Structure in Neural Representations »
Yongyi Yang · Jacob Steinhardt · Wei Hu 
2022 Poster: Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions »
Courtney Paquette · Elliot Paquette · Ben Adlam · Jeffrey Pennington 
2022 Poster: Precise Learning Curves and HigherOrder Scalings for Dotproduct Kernel Regression »
Lechao Xiao · Hong Hu · Theodor Misiakiewicz · Yue Lu · Jeffrey Pennington 
2022 Poster: Fast Neural Kernel Embeddings for General Activations »
Insu Han · Amir Zandieh · Jaehoon Lee · Roman Novak · Lechao Xiao · Amin Karbasi 
2021 Poster: Overparameterization Improves Robustness to Covariate Shift in High Dimensions »
Nilesh Tripuraneni · Ben Adlam · Jeffrey Pennington 
2020 Poster: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha SohlDickstein 
2020 Spotlight: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha SohlDickstein 
2020 Poster: Understanding Double Descent Requires A FineGrained BiasVariance Decomposition »
Ben Adlam · Jeffrey Pennington 
2019 Poster: Learning GANs and Ensembles Using Discrepancy »
Ben Adlam · Corinna Cortes · Mehryar Mohri · Ningshan Zhang 
2019 Poster: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent »
Jaehoon Lee · Lechao Xiao · Samuel Schoenholz · Yasaman Bahri · Roman Novak · Jascha SohlDickstein · Jeffrey Pennington 
2019 Poster: Explaining Landscape Connectivity of Lowcost Solutions for Multilayer Nets »
Rohith Kuditipudi · Xiang Wang · Holden Lee · Yi Zhang · Zhiyuan Li · Wei Hu · Rong Ge · Sanjeev Arora 
2019 Poster: Implicit Regularization in Deep Matrix Factorization »
Sanjeev Arora · Nadav Cohen · Wei Hu · Yuping Luo 
2019 Spotlight: Implicit Regularization in Deep Matrix Factorization »
Sanjeev Arora · Nadav Cohen · Wei Hu · Yuping Luo 
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang 
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang 
2018 Poster: Online Improper Learning with an Approximation Oracle »
Elad Hazan · Wei Hu · Yuanzhi Li · Zhiyuan Li 
2018 Poster: Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced »
Simon Du · Wei Hu · Jason Lee 
2018 Poster: The Spectrum of the Fisher Information Matrix of a SingleHiddenLayer Neural Network »
Jeffrey Pennington · Pratik Worah 
2017 Spotlight: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah 
2017 Poster: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah 
2017 Poster: Linear Convergence of a FrankWolfe Type Algorithm over TraceNorm Balls »
Zeyuan AllenZhu · Elad Hazan · Wei Hu · Yuanzhi Li 
2017 Spotlight: Linear Convergence of a FrankWolfe Type Algorithm over TraceNorm Balls »
Zeyuan AllenZhu · Elad Hazan · Wei Hu · Yuanzhi Li 
2017 Poster: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice »
Jeffrey Pennington · Samuel Schoenholz · Surya Ganguli 
2016 Poster: Combinatorial MultiArmed Bandit with General Reward Functions »
Wei Chen · Wei Hu · Fu Li · Jian Li · Yu Liu · Pinyan Lu 
2015 Poster: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar 
2015 Spotlight: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar