Timezone: »

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
Wei Hu · Lechao Xiao · Ben Adlam · Jeffrey Pennington

Thu Dec 10 09:00 AM -- 11:00 AM (PST) @ Poster Session 5 #1652

Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of well-behaved input distributions, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training.

Author Information

Wei Hu (Princeton University)
Lechao Xiao (Google Research)

Lechao is an AI resident on the Brain team at Google, where he is working on machine learning and deep learning. Prior to Google Brain, he was a Hans Rademacher Instructor of Mathematics at the University of Pennsylvania, where he was working on harmonic analysis. He earned his PhD in mathematics from the University of Illinois at Urbana-Champaign and his BA in pure and applied math from Zhejiang University, Hangzhou, China. Lechao research interests include theory of machine learning and deep learning, optimization, Gaussian process, generalization, etc. He is particularly interested in research problems that has a good combination of theory and practice. He developed (with his coauthor) a mean field theory for convolutional neural networks. He developed several novel initialization methods (orthogonal convolutional kernel and delta orthogonal kernel) which allow practitioners to train neural networks with more than 10,000 layers without the use of any common techniques.

Ben Adlam (Google)
Jeffrey Pennington (Google Brain)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors