Timezone: »
Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that interpolates its training data will typically improve its generalization performance. Explaining the mechanism behind this ``benign overfitting'' in deep networks remains an outstanding challenge. Here, we study the last hidden layer representations of various state-of-the-art convolutional neural networks and find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information and differ from each other only by statistically independent noise. The number of such groups increases linearly with the width of the layer, but only if the width is above a critical value. We show that redundant neurons appear only when the training is regularized and the training error is zero.
Author Information
Diego Doimo (International School for Advanced Studies (SISSA))
Aldo Glielmo (Banca d'Italia)
Sebastian Goldt (SISSA, Trieste, Italy)
Alessandro Laio (International School for Advanced Studies (SISSA))
More from the Same Authors
-
2022 : Data-driven emergence of convolutional structure in neural networks »
Alessandro Ingrosso · Sebastian Goldt -
2023 Poster: The geometry of hidden representations of large transformer models »
Lucrezia Valeriani · Diego Doimo · Francesca Cuturello · Alessandro Laio · Alessio Ansuini · Alberto Cazzaniga -
2023 Poster: Attacks on Online Learners: a Teacher-Student Analysis »
Riccardo Giuseppe Margiotta · Sebastian Goldt · Guido Sanguinetti -
2022 Poster: Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks »
Paolo Muratore · Sina Tafazoli · Eugenio Piasini · Alessandro Laio · Davide Zoccolan -
2021 Poster: Learning curves of generic features maps for realistic datasets with a teacher-student model »
Bruno Loureiro · Cedric Gerbelot · Hugo Cui · Sebastian Goldt · Florent Krzakala · Marc Mezard · Lenka Zdeborová -
2020 Poster: Hierarchical nucleation in deep neural networks »
Diego Doimo · Aldo Glielmo · Alessio Ansuini · Alessandro Laio -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 Poster: Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup »
Sebastian Goldt · Madhu Advani · Andrew Saxe · Florent Krzakala · Lenka Zdeborová -
2019 Oral: Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup »
Sebastian Goldt · Madhu Advani · Andrew Saxe · Florent Krzakala · Lenka Zdeborová -
2019 Poster: Intrinsic dimension of data representations in deep neural networks »
Alessio Ansuini · Alessandro Laio · Jakob H Macke · Davide Zoccolan