Timezone: »
Modern neural networks are often regarded as complex blackbox functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of wellbehaved input distributions, the earlytime learning dynamics of a twolayer fullyconnected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training.
Author Information
Wei Hu (Princeton University)
Lechao Xiao (Google Research)
Lechao is an AI resident on the Brain team at Google, where he is working on machine learning and deep learning. Prior to Google Brain, he was a Hans Rademacher Instructor of Mathematics at the University of Pennsylvania, where he was working on harmonic analysis. He earned his PhD in mathematics from the University of Illinois at UrbanaChampaign and his BA in pure and applied math from Zhejiang University, Hangzhou, China. Lechao research interests include theory of machine learning and deep learning, optimization, Gaussian process, generalization, etc. He is particularly interested in research problems that has a good combination of theory and practice. He developed (with his coauthor) a mean field theory for convolutional neural networks. He developed several novel initialization methods (orthogonal convolutional kernel and delta orthogonal kernel) which allow practitioners to train neural networks with more than 10,000 layers without the use of any common techniques.
Ben Adlam (Google)
Jeffrey Pennington (Google Brain)
Related Events (a corresponding poster, oral, or spotlight)

2020 Spotlight: The Surprising Simplicity of the EarlyTime Learning Dynamics of Neural Networks »
Thu Dec 10th 03:30  03:40 PM Room Orals & Spotlights: Deep Learning
More from the Same Authors

2020 Poster: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha SohlDickstein 
2020 Spotlight: Finite Versus Infinite Neural Networks: an Empirical Study »
Jaehoon Lee · Samuel Schoenholz · Jeffrey Pennington · Ben Adlam · Lechao Xiao · Roman Novak · Jascha SohlDickstein 
2020 Poster: Understanding Double Descent Requires A FineGrained BiasVariance Decomposition »
Ben Adlam · Jeffrey Pennington 
2019 Poster: Learning GANs and Ensembles Using Discrepancy »
Ben Adlam · Corinna Cortes · Mehryar Mohri · Ningshan Zhang 
2019 Poster: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent »
Jaehoon Lee · Lechao Xiao · Samuel Schoenholz · Yasaman Bahri · Roman Novak · Jascha SohlDickstein · Jeffrey Pennington 
2019 Poster: Explaining Landscape Connectivity of Lowcost Solutions for Multilayer Nets »
Rohith Kuditipudi · Xiang Wang · Holden Lee · Yi Zhang · Zhiyuan Li · Wei Hu · Rong Ge · Sanjeev Arora 
2019 Poster: Implicit Regularization in Deep Matrix Factorization »
Sanjeev Arora · Nadav Cohen · Wei Hu · Yuping Luo 
2019 Spotlight: Implicit Regularization in Deep Matrix Factorization »
Sanjeev Arora · Nadav Cohen · Wei Hu · Yuping Luo 
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang 
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang 
2018 Poster: Online Improper Learning with an Approximation Oracle »
Elad Hazan · Wei Hu · Yuanzhi Li · Zhiyuan Li 
2018 Poster: Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced »
Simon Du · Wei Hu · Jason Lee 
2018 Poster: The Spectrum of the Fisher Information Matrix of a SingleHiddenLayer Neural Network »
Jeffrey Pennington · Pratik Worah 
2017 Spotlight: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah 
2017 Poster: Nonlinear random matrix theory for deep learning »
Jeffrey Pennington · Pratik Worah 
2017 Poster: Linear Convergence of a FrankWolfe Type Algorithm over TraceNorm Balls »
Zeyuan AllenZhu · Elad Hazan · Wei Hu · Yuanzhi Li 
2017 Spotlight: Linear Convergence of a FrankWolfe Type Algorithm over TraceNorm Balls »
Zeyuan AllenZhu · Elad Hazan · Wei Hu · Yuanzhi Li 
2017 Poster: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice »
Jeffrey Pennington · Samuel Schoenholz · Surya Ganguli 
2016 Poster: Combinatorial MultiArmed Bandit with General Reward Functions »
Wei Chen · Wei Hu · Fu Li · Jian Li · Yu Liu · Pinyan Lu 
2015 Poster: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar 
2015 Spotlight: Spherical Random Features for Polynomial Kernels »
Jeffrey Pennington · Felix Yu · Sanjiv Kumar