Timezone: »
Empirical studies have revealed that many minima in the loss landscape of deep learning are connected and reside on a low-loss valley. Yet, little is known about the theoretical origin of these low-loss valleys. Ensemble models sampling different parts of a low-loss valley have reached state-of-the-art performance. However, we lack theoretical ways to measure what portions of low-loss valleys are being explored during training. We address these two aspects of low-loss valleys using symmetries and conserved quantities. We show that continuous symmetries in the parameter space of neural networks can give rise to low- loss valleys. We then show that conserved quantities associated with these symmetries can be used to define coordinates along low-loss valleys. These conserved quantities reveal that gradient flow only explores a small part of a low-loss valley. We use conserved quantities to explore other parts of the loss valley by proposing alternative initialization schemes.
Author Information
Bo Zhao (University of California, San Diego)
Iordan Ganev (Radboud University)
Robin Walters (Northeastern University)
Rose Yu (UC San Diego)
Nima Dehmamy (IBM Research)
I obtained my PhD in physics on complex systems from Boston University in 2016. I did postdoc at Northeastern University working on 3D embedded graphs and graph neural networks. My current research is on physics-informed machine learning and computational social science.
More from the Same Authors
-
2020 : Paper 60: Traffic Forecasting using Vehicle-to-Vehicle Communication and Recurrent Neural Networks »
Rose Yu -
2022 : A Noether's theorem for gradient flow: Continuous symmetries of the architecture and conserved quantities of gradient flow »
Bo Zhao · Iordan Ganev · Robin Walters · Rose Yu · Nima Dehmamy -
2022 : Understanding Optimization Challenges when Encoding to Geometric Structures »
Babak Esmaeili · Robin Walters · Heiko Zimmermann · Jan-Willem van de Meent -
2022 : Image to Icosahedral Projection for $\mathrm{SO}(3)$ Object Reasoning from Single-View Images »
David Klee · Ondrej Biza · Robert Platt · Robin Walters -
2022 : Rethinking Neural Relational Inference for Granger Causal Discovery »
Stefanos Bennett · Rose Yu -
2022 : Rose Yu: "Physics-Guided Deep Learning for Climate Science" »
Rose Yu -
2022 : Keynote Talk 2 »
Rose Yu -
2022 : Panel Discussion I: Geometric and topological principles for representation learning in ML »
Irina Higgins · Taco Cohen · Erik Bekkers · Nina Miolane · Rose Yu -
2022 Poster: Meta-Learning Dynamics Forecasting Using Task Inference »
Rui Wang · Robin Walters · Rose Yu -
2022 Poster: Symmetry Teleportation for Accelerated Optimization »
Bo Zhao · Nima Dehmamy · Robin Walters · Rose Yu -
2021 : Physics-Guided AI for Modeling Autonomous Vehicle Dynamics »
Rose Yu · Rose Yu -
2021 Poster: Automatic Symmetry Discovery with Lie Algebra Convolutional Network »
Nima Dehmamy · Robin Walters · Yanchen Liu · Dashun Wang · Rose Yu -
2020 : Q/A and Discussion for ML Theory Session »
Karthik Kashinath · Mayur Mudigonda · Stephan Mandt · Rose Yu -
2020 : Rose Yu »
Rose Yu -
2020 : 15 - Lie Algebra Convolutional Networks with Automatic Symmetry Extraction »
Nima Dehmamy -
2020 : Rose Yu - Physics-Guided AI for Learning Spatiotemporal Dynamics »
Rose Yu -
2020 Workshop: Machine Learning for Engineering Modeling, Simulation and Design »
Alex Beatson · Priya Donti · Amira Abdel-Rahman · Stephan Hoyer · Rose Yu · J. Zico Kolter · Ryan Adams -
2020 : Invited Talk 11 Q&A by Rose »
Rose Yu -
2020 : Invited Talk 11: Tensor Methods for Efficient and Interpretable Spatiotemporal Learning »
Rose Yu -
2020 : Quantifying Uncertainty in Deep Spatiotemporal Forecasting for COVID-19 »
Yian Ma · Rose Yu -
2020 Poster: Deep Imitation Learning for Bimanual Robotic Manipulation »
Fan Xie · Alexander Chowdhury · M. Clara De Paolis Kaluza · Linfeng Zhao · Lawson Wong · Rose Yu -
2020 Poster: Learning Disentangled Representations of Videos with Missing Data »
Armand Comas · Chi Zhang · Zlatan Feric · Octavia Camps · Rose Yu -
2020 Session: Orals & Spotlights Track 06: Dynamical Sys/Density/Sparsity »
Animesh Garg · Rose Yu -
2018 : Long Range Sequence Generation via Multiresolution Adversarial Training »
Rose Yu