Timezone: »
In many cases, neural networks trained with stochastic gradient descent (SGD) that share an early and often small portion of the training trajectory have solutions connected by a linear path of low loss. This phenomenon, called linear mode connectivity (LMC), has been leveraged for pruning and model averaging in large neural network models, but it is not well understood how broadly or why it occurs. LMC suggests that SGD trajectories somehow end up in a \textit{``convex"} region of the loss landscape and stay there. In this work, we confirm that this eventually does happen by finding a high-dimensional convex hull of low loss between the endpoints of several SGD trajectories. But to our surprise, simple measures of convexity do not show any obvious transition at the point when SGD will converge into this region. To understand this convex hull better, we investigate the functional behaviors of its endpoints. We find that only a small number of correct predictions are shared between all endpoints of a hull, and an even smaller number of correct predictions are shared between the hulls, even when the final accuracy is high for every endpoint. Thus, we tie LMC more tightly to convexity, and raise several new questions about the source of this convexity in neural network optimization.
Author Information
David Yunis (TTIC)
Kumar Kshitij Patel (Toyota Technological Institute at Chicago)
Pedro Savarese (TTIC)
Gal Vardi (TTI-Chicago)
Jonathan Frankle (MIT CSAIL)
Matthew Walter (TTI-Chicago)
Karen Livescu (TTI-Chicago)
Michael Maire (University of Chicago)
More from the Same Authors
-
2022 : Unmasking the Lottery Ticket Hypothesis: Efficient Adaptive Pruning for Finding Winning Tickets »
Mansheej Paul · Feng Chen · Brett Larsen · Jonathan Frankle · Surya Ganguli · Gintare Karolina Dziugaite -
2022 : Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates »
Jacob Portes · Davis Blalock · Cory Stephenson · Jonathan Frankle -
2022 : Distributed Online and Bandit Convex Optimization »
Kumar Kshitij Patel · Aadirupa Saha · Nati Srebro · Lingxiao Wang -
2022 : The Effect of Data Dimensionality on Neural Network Prunability »
Zachary Ankner · Alex Renda · Gintare Karolina Dziugaite · Jonathan Frankle · Tian Jin -
2023 Poster: The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks »
Spencer Frei · Gal Vardi · Peter Bartlett · Nati Srebro -
2023 Poster: Most Neural Networks Are Almost Learnable »
Amit Daniely · Nati Srebro · Gal Vardi -
2023 Poster: Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces »
Odelia Melamed · Gilad Yehudai · Gal Vardi -
2023 Poster: Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses »
Gon Buzaglo · Niv Haim · Gilad Yehudai · Gal Vardi · Yakir Oz · Yaniv Nikankin · Michal Irani -
2023 Poster: Provably Efficient Personalized Multi-Objective Decision Making via Comparative Feedback »
Han Shao · Lee Cohen · Avrim Blum · Yishay Mansour · Aadirupa Saha · Matthew Walter -
2023 Poster: Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation »
Xin Yuan · Pedro Savarese · Michael Maire -
2023 Poster: Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy »
Amit Daniely · Nati Srebro · Gal Vardi -
2022 Panel: Panel 1C-2: Reconstructing Training Data… & On Optimal Learning… »
Gal Vardi · Idan Mehalel -
2022 : Poster Session 2 »
Jinwuk Seok · Bo Liu · Ryotaro Mitsuboshi · David Martinez-Rubio · Weiqiang Zheng · Ilgee Hong · Chen Fan · Kazusato Oko · Bo Tang · Miao Cheng · Aaron Defazio · Tim G. J. Rudner · Gabriele Farina · Vishwak Srinivasan · Ruichen Jiang · Peng Wang · Jane Lee · Nathan Wycoff · Nikhil Ghosh · Yinbin Han · David Mueller · Liu Yang · Amrutha Varshini Ramesh · Siqi Zhang · Kaifeng Lyu · David Yunis · Kumar Kshitij Patel · Fangshuo Liao · Dmitrii Avdiukhin · Xiang Li · Sattar Vakili · Jiaxin Shi -
2022 Poster: On Margin Maximization in Linear and ReLU Networks »
Gal Vardi · Ohad Shamir · Nati Srebro -
2022 Poster: Not All Bits have Equal Value: Heterogeneous Precisions via Trainable Noise »
Pedro Savarese · Xin Yuan · Yanjing Li · Michael Maire -
2022 Poster: Towards Optimal Communication Complexity in Distributed Non-Convex Optimization »
Kumar Kshitij Patel · Lingxiao Wang · Blake Woodworth · Brian Bullins · Nati Srebro -
2022 Poster: The Sample Complexity of One-Hidden-Layer Neural Networks »
Gal Vardi · Ohad Shamir · Nati Srebro -
2022 Poster: On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias »
Itay Safran · Gal Vardi · Jason Lee -
2022 Poster: Reconstructing Training Data From Trained Neural Networks »
Niv Haim · Gal Vardi · Gilad Yehudai · Ohad Shamir · Michal Irani -
2022 Poster: Gradient Methods Provably Converge to Non-Robust Networks »
Gal Vardi · Gilad Yehudai · Ohad Shamir -
2021 : AI Driving Olympics + Q&A »
Andrea Censi · Liam Paull · Jacopo Tani · Emilio Frazzoli · Holger Caesar · Matthew Walter · Andrea Daniele · Sahika Genc · Sharada Mohanty -
2021 Poster: Online Meta-Learning via Learning with Layer-Distributed Memory »
Sudarshan Babu · Pedro Savarese · Michael Maire -
2021 Poster: Learning a Single Neuron with Bias Using Gradient Descent »
Gal Vardi · Gilad Yehudai · Ohad Shamir -
2021 Poster: A Stochastic Newton Algorithm for Distributed Convex Optimization »
Brian Bullins · Kshitij Patel · Ohad Shamir · Nathan Srebro · Blake Woodworth -
2020 : Pruning Neural Networks at Initialization: Why Are We Missing the Mark? »
Jonathan Frankle -
2020 : Revisiting "Qualitatively Characterizing Neural Network Optimization Problems" »
Jonathan Frankle -
2020 : Panel »
Kilian Weinberger · Maria De-Arteaga · Shibani Santurkar · Jonathan Frankle · Deborah Raji -
2020 Workshop: Self-Supervised Learning for Speech and Audio Processing »
Abdelrahman Mohamed · Hung-yi Lee · Shinji Watanabe · Shang-Wen Li · Tara Sainath · Karen Livescu -
2020 Poster: Neural Networks with Small Weights and Depth-Separation Barriers »
Gal Vardi · Ohad Shamir -
2020 Poster: Winning the Lottery with Continuous Sparsification »
Pedro Savarese · Hugo Silva · Michael Maire -
2020 Poster: Minibatch vs Local SGD for Heterogeneous Distributed Learning »
Blake Woodworth · Kumar Kshitij Patel · Nati Srebro -
2020 Poster: Self-Supervised Visual Representation Learning from Hierarchical Grouping »
Xiao Zhang · Michael Maire -
2020 Spotlight: Self-Supervised Visual Representation Learning from Hierarchical Grouping »
Xiao Zhang · Michael Maire -
2020 Poster: The Lottery Ticket Hypothesis for Pre-trained BERT Networks »
Tianlong Chen · Jonathan Frankle · Shiyu Chang · Sijia Liu · Yang Zhang · Zhangyang Wang · Michael Carbin -
2020 Poster: Hardness of Learning Neural Networks with Natural Weights »
Amit Daniely · Gal Vardi -
2019 : Contributed Session - Spotlight Talks »
Jonathan Frankle · David Schwab · Ari Morcos · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · YiDing Jiang · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Sho Yaida · Muqiao Yang -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 : The AI Driving Olympics: An Accessible Robot Learning Benchmark »
Matthew Walter -
2019 Poster: Communication trade-offs for Local-SGD with large step size »
Aymeric Dieuleveut · Kumar Kshitij Patel -
2019 Poster: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards »
Falcon Dai · Matthew Walter -
2017 : Panel: Machine learning and audio signal processing: State of the art and future perspectives »
Sepp Hochreiter · Bo Li · Karen Livescu · Arindam Mandal · Oriol Nieto · Malcolm Slaney · Hendrik Purwins -
2017 : Acoustic word embeddings for speech search »
Karen Livescu -
2015 : Listen, Attend and Walk: Neural Mapping of Navigational Instructions to Action Sequences »
Matthew Walter