Timezone: »
Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynamics of stochastic gradient descent (SGD) is captured by a set of differential equations and prove that this description is asymptotically exact in the limit of large inputs. Using this framework, we calculate the final generalisation error of student networks that have more parameters than their teachers. We find that the final generalisation error of the student increases with network size when training only the first layer, but stays constant or even decreases with size when training both layers. We show that these different behaviours have their root in the different solutions SGD finds for different activation functions. Our results indicate that achieving good generalisation in neural networks goes beyond the properties of SGD alone and depends on the interplay of at least the algorithm, the model architecture, and the data set.
Author Information
Sebastian Goldt (Institut de Physique Théorique, CNRS, Paris)
Madhu Advani (Apple)
Andrew Saxe (University of Oxford)
Florent Krzakala (École Normale Supérieure)
Lenka Zdeborová (CEA Saclay)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup »
Thu. Dec 12th 01:00 -- 03:00 AM Room East Exhibition Hall B + C #238
More from the Same Authors
-
2021 Spotlight: Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymptotics in High-dimensions »
Bruno Loureiro · Gabriele Sicuro · Cedric Gerbelot · Alessandro Pacco · Florent Krzakala · Lenka Zdeborová -
2022 : Data-driven emergence of convolutional structure in neural networks »
Alessandro Ingrosso · Sebastian Goldt -
2022 Workshop: Machine Learning and the Physical Sciences »
Atilim Gunes Baydin · Adji Bousso Dieng · Emine Kucukbenli · Gilles Louppe · Siddharth Mishra-Sharma · Benjamin Nachman · Brian Nord · Savannah Thais · Anima Anandkumar · Kyle Cranmer · Lenka Zdeborová · Rianne van den Berg -
2022 Poster: Redundant representations help generalization in wide neural networks »
Diego Doimo · Aldo Glielmo · Sebastian Goldt · Alessandro Laio -
2022 Poster: Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks »
Rodrigo Veiga · Ludovic Stephan · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová -
2022 Poster: Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap »
Luca Pesce · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová -
2022 Poster: Multi-layer State Evolution Under Random Convolutional Design »
Max Daniels · Cedric Gerbelot · Florent Krzakala · Lenka Zdeborová -
2021 Workshop: Machine Learning and the Physical Sciences »
Anima Anandkumar · Kyle Cranmer · Mr. Prabhat · Lenka Zdeborová · Atilim Gunes Baydin · Juan Carrasquilla · Emine Kucukbenli · Gilles Louppe · Benjamin Nachman · Brian Nord · Savannah Thais -
2021 Poster: Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymptotics in High-dimensions »
Bruno Loureiro · Gabriele Sicuro · Cedric Gerbelot · Alessandro Pacco · Florent Krzakala · Lenka Zdeborová -
2021 Poster: Learning curves of generic features maps for realistic datasets with a teacher-student model »
Bruno Loureiro · Cedric Gerbelot · Hugo Cui · Sebastian Goldt · Florent Krzakala · Marc Mezard · Lenka Zdeborová -
2021 Poster: Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime »
Hugo Cui · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová -
2020 : Opening Remarks »
Reinhard Heckel · Paul Hand · Soheil Feizi · Lenka Zdeborová · Richard Baraniuk -
2020 Workshop: Workshop on Deep Learning and Inverse Problems »
Reinhard Heckel · Paul Hand · Richard Baraniuk · Lenka Zdeborová · Soheil Feizi -
2020 Workshop: Machine Learning and the Physical Sciences »
Anima Anandkumar · Kyle Cranmer · Shirley Ho · Mr. Prabhat · Lenka Zdeborová · Atilim Gunes Baydin · Juan Carrasquilla · Adji Bousso Dieng · Karthik Kashinath · Gilles Louppe · Brian Nord · Michela Paganini · Savannah Thais -
2020 Poster: Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization »
Benjamin Aubin · Florent Krzakala · Yue Lu · Lenka Zdeborová -
2020 Poster: Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions »
Stefano Sarao Mannelli · Eric Vanden-Eijnden · Lenka Zdeborová -
2020 Poster: Characterizing emergent representations in a space of candidate learning rules for deep networks »
Yinan Cao · Christopher Summerfield · Andrew Saxe -
2020 Poster: Phase retrieval in high dimensions: Statistical and computational phase transitions »
Antoine Maillard · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová -
2020 Poster: Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification »
Francesca Mignacco · Florent Krzakala · Pierfrancesco Urbani · Lenka Zdeborová -
2020 Poster: Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval »
Stefano Sarao Mannelli · Giulio Biroli · Chiara Cammarota · Florent Krzakala · Pierfrancesco Urbani · Lenka Zdeborová -
2019 : Lenka Zdeborova »
Lenka Zdeborová -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 : Surya Ganguli, Yasaman Bahri, Florent Krzakala moderated by Lenka Zdeborova »
Florent Krzakala · Yasaman Bahri · Surya Ganguli · Lenka Zdeborová · Adji Bousso Dieng · Joan Bruna -
2019 : Florent Krzakala - Learning with "realistic" synthetic data »
Florent Krzakala -
2019 : Poster Session »
Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie -
2019 : The spiked matrix model with generative priors »
Lenka Zdeborová -
2019 Poster: The spiked matrix model with generative priors »
Benjamin Aubin · Bruno Loureiro · Antoine Maillard · Florent Krzakala · Lenka Zdeborová -
2019 Poster: Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models »
Stefano Sarao Mannelli · Giulio Biroli · Chiara Cammarota · Florent Krzakala · Lenka Zdeborová -
2019 Spotlight: Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models »
Stefano Sarao Mannelli · Giulio Biroli · Chiara Cammarota · Florent Krzakala · Lenka Zdeborová -
2018 Poster: Entropy and mutual information in models of deep neural networks »
Marylou Gabrié · Andre Manoel · Clément Luneau · jean barbier · Nicolas Macris · Florent Krzakala · Lenka Zdeborová -
2018 Poster: The committee machine: Computational to statistical gaps in learning a two-layers neural network »
Benjamin Aubin · Antoine Maillard · jean barbier · Florent Krzakala · Nicolas Macris · Lenka Zdeborová -
2018 Spotlight: The committee machine: Computational to statistical gaps in learning a two-layers neural network »
Benjamin Aubin · Antoine Maillard · jean barbier · Florent Krzakala · Nicolas Macris · Lenka Zdeborová -
2018 Spotlight: Entropy and mutual information in models of deep neural networks »
Marylou Gabrié · Andre Manoel · Clément Luneau · jean barbier · Nicolas Macris · Florent Krzakala · Lenka Zdeborová -
2016 Poster: Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula »
jean barbier · Mohamad Dia · Nicolas Macris · Florent Krzakala · Thibault Lesieur · Lenka Zdeborová -
2015 Poster: Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation »
Alaa Saade · Florent Krzakala · Lenka Zdeborová -
2014 Poster: Spectral Clustering of graphs with the Bethe Hessian »
Alaa Saade · Florent Krzakala · Lenka Zdeborová -
2013 Poster: Blind Calibration in Compressed Sensing using Message Passing Algorithms »
Christophe Schulke · Francesco Caltagirone · Florent Krzakala · Lenka Zdeborová