Timezone: »
Poster
When Are Solutions Connected in Deep Networks?
Quynh Nguyen · Pierre Bréchet · Marco Mondelli
The question of how and why the phenomenon of mode connectivity occurs in training deep neural networks has gained remarkable attention in the research community. From a theoretical perspective, two possible explanations have been proposed: (i) the loss function has connected sublevel sets, and (ii) the solutions found by stochastic gradient descent are dropout stable. While these explanations provide insights into the phenomenon, their assumptions are not always satisfied in practice. In particular, the first approach requires the network to have one layer with order of $N$ neurons ($N$ being the number of training samples), while the second one requires the loss to be almost invariant after removing half of the neurons at each layer (up to some rescaling of the remaining ones). In this work, we improve both conditions by exploiting the quality of the features at every intermediate layer together with a milder over-parameterization requirement. More specifically, we show that: (i) under generic assumptions on the features of intermediate layers, it suffices that the last two hidden layers have order of $\sqrt{N}$ neurons, and (ii) if subsets of features at each layer are linearly separable, then almost no over-parameterization is needed to show the connectivity. Our experiments confirm that the proposed condition ensures the connectivity of solutions found by stochastic gradient descent, even in settings where the previous requirements do not hold.
Author Information
Quynh Nguyen (MPI-MIS)
Pierre Bréchet (MPI MiS)
Marco Mondelli (IST Austria)
More from the Same Authors
-
2022 : Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence »
Diyuan Wu · Vyacheslav Kungurtsev · Marco Mondelli -
2022 : Poster Session 1 »
Andrew Lowy · Thomas Bonnier · Yiling Xie · Guy Kornowski · Simon Schug · Seungyub Han · Nicolas Loizou · xinwei zhang · Laurent Condat · Tabea E. Röber · Si Yi Meng · Marco Mondelli · Runlong Zhou · Eshaan Nichani · Adrian Goldwaser · Rudrajit Das · Kayhan Behdin · Atish Agarwala · Mukul Gagrani · Gary Cheng · Tian Li · Haoran Sun · Hossein Taheri · Allen Liu · Siqi Zhang · Dmitrii Avdiukhin · Bradley Brown · Miaolan Xie · Junhyung Lyle Kim · Sharan Vaswani · Xinmeng Huang · Ganesh Ramachandra Kini · Angela Yuan · Weiqiang Zheng · Jiajin Li -
2022 Poster: The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation? »
Jean Barbier · TianQi Hou · Marco Mondelli · Manuel Saenz -
2022 Poster: Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization »
Simone Bombari · Mohammad Hossein Amani · Marco Mondelli -
2021 Poster: PCA Initialization for Approximate Message Passing in Rotationally Invariant Models »
Marco Mondelli · Ramji Venkataramanan -
2020 Poster: Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology »
Quynh Nguyen · Marco Mondelli -
2016 Poster: Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods »
Antoine Gautier · Quynh Nguyen · Matthias Hein