Timezone: »
Neural networks have been shown to perform incredibly well in classification tasks over structured high-dimensional datasets. However, the learning dynamics of such networks is still poorly understood. In this paper we study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task. We show that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss, for which the dynamics can be explicitly solved in the infinite dataset limit. This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between feature and lazy learning, and overfitting. Finally, we asses the limitations of mean-field theory by studying the case of large but finite number of nodes and of training samples.
Author Information
Franco Pellegrini (École normale supérieure, Paris)
Giulio Biroli (ENS)
More from the Same Authors
-
2020 Poster: Triple descent and the two kinds of overfitting: where & why do they appear? »
Stéphane d'Ascoli · Levent Sagun · Giulio Biroli -
2020 Spotlight: Triple descent and the two kinds of overfitting: where & why do they appear? »
Stéphane d'Ascoli · Levent Sagun · Giulio Biroli -
2020 Poster: Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval »
Stefano Sarao Mannelli · Giulio Biroli · Chiara Cammarota · Florent Krzakala · Pierfrancesco Urbani · Lenka Zdeborová -
2019 Poster: Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias »
Stéphane d'Ascoli · Levent Sagun · Giulio Biroli · Joan Bruna -
2019 Poster: Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models »
Stefano Sarao Mannelli · Giulio Biroli · Chiara Cammarota · Florent Krzakala · Lenka Zdeborová -
2019 Spotlight: Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models »
Stefano Sarao Mannelli · Giulio Biroli · Chiara Cammarota · Florent Krzakala · Lenka Zdeborová