Timezone: »
In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics. In multiple neural architectures and datasets, we find these diverse measures evolve in a highly correlated manner, revealing a universal picture of the deep learning process. In this picture, deep network training exhibits a highly chaotic rapid initial transient that within 2 to 3 epochs determines the final linearly connected basin of low loss containing the end point of training. During this chaotic transient, the NTK changes rapidly, learning useful features from the training data that enables it to outperform the standard initial NTK by a factor of 3 in less than 3 to 4 epochs. After this rapid chaotic transient, the NTK changes at constant velocity, and its performance matches that of full network training in 15\% to 45\% of training time. Overall, our analysis reveals a striking correlation between a diverse set of metrics over training time, governed by a rapid chaotic to stable transition in the first few epochs, that together poses challenges and opportunities for the development of more accurate theories of deep learning.
Author Information
Stanislav Fort (Stanford University / Google Research)
Gintare Karolina Dziugaite (Element AI)
Mansheej Paul (Stanford University)
Sepideh Kharaghani (Element AI)
Dan Roy (Univ of Toronto & Vector)
Surya Ganguli (Stanford)
More from the Same Authors
-
2020 Poster: Predictive coding in balanced neural networks with noise, chaos and delays »
Jonathan Kadmon · Jonathan Timcheck · Surya Ganguli -
2020 Poster: Identifying Learning Rules From Neural Network Observables »
Aran Nayebi · Sanjana Srivastava · Surya Ganguli · Daniel Yamins -
2020 Spotlight: Identifying Learning Rules From Neural Network Observables »
Aran Nayebi · Sanjana Srivastava · Surya Ganguli · Daniel Yamins -
2020 Poster: Adaptive Gradient Quantization for Data-Parallel SGD »
Fartash Faghri · Iman Tabrizian · Ilia Markov · Dan Alistarh · Daniel Roy · Ali Ramezani-Kebrya -
2020 Poster: Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms »
Mahdi Haghifam · Jeffrey Negrea · Ashish Khisti · Daniel Roy · Gintare Karolina Dziugaite -
2020 Poster: In search of robust measures of generalization »
Gintare Karolina Dziugaite · Alexandre Drouin · Brady Neal · Nitarshan Rajkumar · Ethan Caballero · Linbo Wang · Ioannis Mitliagkas · Daniel Roy -
2020 Poster: Pruning neural networks without any data by iteratively conserving synaptic flow »
Hidenori Tanaka · Daniel Kunin · Daniel Yamins · Surya Ganguli -
2019 Workshop: Machine Learning with Guarantees »
Ben London · Gintare Karolina Dziugaite · Daniel Roy · Thorsten Joachims · Aleksander Madry · John Shawe-Taylor -
2019 Poster: Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates »
Jeffrey Negrea · Mahdi Haghifam · Gintare Karolina Dziugaite · Ashish Khisti · Daniel Roy -
2019 Poster: A unified theory for the origin of grid cells through the lens of pattern formation »
Ben Sorscher · Gabriel Mel · Surya Ganguli · Samuel Ocko -
2019 Poster: Universality and individuality in neural dynamics across large populations of recurrent networks »
Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo -
2019 Spotlight: A unified theory for the origin of grid cells through the lens of pattern formation »
Ben Sorscher · Gabriel Mel · Surya Ganguli · Samuel Ocko -
2019 Spotlight: Universality and individuality in neural dynamics across large populations of recurrent networks »
Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo -
2019 Poster: Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes »
Jun Yang · Shengyang Sun · Daniel Roy -
2019 Poster: From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction »
Hidenori Tanaka · Aran Nayebi · Niru Maheswaranathan · Lane McIntosh · Stephen Baccus · Surya Ganguli -
2019 Poster: Large Scale Structure of Neural Network Loss Landscapes »
Stanislav Fort · Stanislaw Jastrzebski -
2019 Poster: Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics »
Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo -
2018 Poster: The emergence of multiple retinal cell types through efficient coding of natural movies »
Samuel Ocko · Jack Lindsey · Surya Ganguli · Stephane Deny -
2018 Poster: Statistical mechanics of low-rank tensor decomposition »
Jonathan Kadmon · Surya Ganguli -
2018 Poster: Data-dependent PAC-Bayes priors via differential privacy »
Gintare Karolina Dziugaite · Daniel Roy -
2018 Poster: Task-Driven Convolutional Recurrent Models of the Visual System »
Aran Nayebi · Daniel Bear · Jonas Kubilius · Kohitij Kar · Surya Ganguli · David Sussillo · James J DiCarlo · Daniel Yamins -
2017 Poster: Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net »
Anirudh Goyal ALIAS PARTH GOYAL · Nan Rosemary Ke · Surya Ganguli · Yoshua Bengio -
2017 Poster: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice »
Jeffrey Pennington · Samuel Schoenholz · Surya Ganguli -
2016 Poster: Measuring the reliability of MCMC inference with bidirectional Monte Carlo »
Roger Grosse · Siddharth Ancha · Daniel Roy -
2016 Poster: Exponential expressivity in deep neural networks through transient chaos »
Ben Poole · Subhaneil Lahiri · Maithra Raghu · Jascha Sohl-Dickstein · Surya Ganguli -
2016 Poster: An equivalence between high dimensional Bayes optimal inference and M-estimation »
Madhu Advani · Surya Ganguli -
2016 Poster: Deep Learning Models of the Retinal Response to Natural Scenes »
Lane McIntosh · Niru Maheswaranathan · Aran Nayebi · Surya Ganguli · Stephen Baccus -
2015 Poster: Deep Knowledge Tracing »
Chris Piech · Jonathan Bassen · Jonathan Huang · Surya Ganguli · Mehran Sahami · Leonidas Guibas · Jascha Sohl-Dickstein -
2014 Workshop: 3rd NIPS Workshop on Probabilistic Programming »
Daniel Roy · Josh Tenenbaum · Thomas Dietterich · Stuart J Russell · YI WU · Ulrik R Beierholm · Alp Kucukelbir · Zenna Tavares · Yura Perov · Daniel Lee · Brian Ruttenberg · Sameer Singh · Michael Hughes · Marco Gaboardi · Alexey Radul · Vikash Mansinghka · Frank Wood · Sebastian Riedel · Prakash Panangaden -
2014 Workshop: Deep Learning and Representation Learning »
Andrew Y Ng · Yoshua Bengio · Adam Coates · Roland Memisevic · Sharanyan Chetlur · Geoffrey E Hinton · Shamim Nemati · Bryan Catanzaro · Surya Ganguli · Herbert Jaeger · Phil Blunsom · Leon Bottou · Volodymyr Mnih · Chen-Yu Lee · Rich M Schwartz -
2014 Poster: Gibbs-type Indian Buffet Processes »
Creighton Heaukulani · Daniel Roy -
2014 Poster: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization »
Yann N Dauphin · Razvan Pascanu · Caglar Gulcehre · Kyunghyun Cho · Surya Ganguli · Yoshua Bengio -
2014 Poster: Mondrian Forests: Efficient Online Random Forests »
Balaji Lakshminarayanan · Daniel Roy · Yee Whye Teh -
2013 Poster: A memory frontier for complex synapses »
Subhaneil Lahiri · Surya Ganguli -
2013 Oral: A memory frontier for complex synapses »
Subhaneil Lahiri · Surya Ganguli -
2013 Session: Session Chair »
Daniel Roy -
2013 Session: Tutorial Session B »
Daniel Roy -
2012 Workshop: Probabilistic Programming: Foundations and Applications (2 day) »
Vikash Mansinghka · Daniel Roy · Noah Goodman -
2012 Workshop: Probabilistic Programming: Foundations and Applications (2 day) »
Vikash Mansinghka · Daniel Roy · Noah Goodman -
2012 Poster: Random function priors for exchangeable graphs and arrays »
James R Lloyd · Daniel Roy · Peter Orbanz · Zoubin Ghahramani -
2011 Poster: Complexity of Inference in Latent Dirichlet Allocation »
David Sontag · Daniel Roy -
2011 Spotlight: Complexity of Inference in Latent Dirichlet Allocation »
David Sontag · Daniel Roy -
2010 Poster: Short-term memory in neuronal networks through dynamical compressed sensing »
Surya Ganguli · Haim Sompolinsky -
2008 Workshop: Probabilistic Programming: Universal Languages, Systems and Applications »
Daniel Roy · John Winn · David A McAllester · Vikash Mansinghka · Josh Tenenbaum -
2008 Oral: The Mondrian Process »
Daniel Roy · Yee Whye Teh -
2008 Poster: The Mondrian Process »
Daniel Roy · Yee Whye Teh -
2007 Poster: Bayesian Agglomerative Clustering with Coalescents »
Yee Whye Teh · Hal Daumé III · Daniel Roy -
2007 Oral: Bayesian Agglomerative Clustering with Coalescents »
Yee Whye Teh · Hal Daumé III · Daniel Roy -
2006 Poster: Learning annotated hierarchies from relational data »
Daniel Roy · Charles Kemp · Vikash Mansinghka · Josh Tenenbaum -
2006 Talk: Learning annotated hierarchies from relational data »
Daniel Roy · Charles Kemp · Vikash Mansinghka · Josh Tenenbaum