Timezone: »
The continuous-time model of Nesterov's momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization. One of the main ideas in this line of research comes from the field of classical mechanics and proposes to link Nesterov's trajectory to the solution of a set of Euler-Lagrange equations relative to the so-called Bregman Lagrangian. In the last years, this approach led to the discovery of many new (stochastic) accelerated algorithms and provided a solid theoretical foundation for the design of structure-preserving accelerated methods. In this work, we revisit this idea and provide an in-depth analysis of the action relative to the Bregman Lagrangian from the point of view of calculus of variations. Our main finding is that, while Nesterov's method is a stationary point for the action, it is often not a minimizer but instead a saddle point for this functional in the space of differentiable curves. This finding challenges the main intuition behind the variational interpretation of Nesterov's method and provides additional insights into the intriguing geometry of accelerated paths.
Author Information
Peiyuan Zhang (ETH Zurich)
Antonio Orvieto (ETH Zurich)
PhD Student at ETH Zurich. I’m interested in the design and analysis of optimization algorithms for deep learning. Interned at DeepMind, MILA, and Meta. All publications at http://orvi.altervista.org/ Looking for postdoc positions! :) antonio.orvieto@inf.ethz.ch
Hadi Daneshmand (INRIA PARIS)
More from the Same Authors
-
2021 Spotlight: Batch Normalization Orthogonalizes Representations in Deep Random Networks »
Hadi Daneshmand · Amir Joudaki · Francis Bach -
2022 : Batch size selection by stochastic optimal contro »
Jim Zhao · Aurelien Lucchi · Frank Proske · Antonio Orvieto · Hans Kersting -
2022 : Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning »
Sanghwan Kim · Lorenzo Noci · Antonio Orvieto · Thomas Hofmann -
2023 Poster: On the impact of activation and normalization in obtaining isometric embeddings at initialization »
Amir Joudaki · Hadi Daneshmand · Francis Bach -
2023 Poster: Transformers learn to implement preconditioned gradient descent for in-context learning »
Kwangjun Ahn · Xiang Cheng · Hadi Daneshmand · Suvrit Sra -
2022 Poster: On the Theoretical Properties of Noise Correlation in Stochastic Optimization »
Aurelien Lucchi · Frank Proske · Antonio Orvieto · Francis Bach · Hans Kersting -
2022 Poster: Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse »
Lorenzo Noci · Sotiris Anagnostidis · Luca Biggio · Antonio Orvieto · Sidak Pal Singh · Aurelien Lucchi -
2022 Poster: Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution »
Antonio Orvieto · Simon Lacoste-Julien · Nicolas Loizou -
2021 : Empirics on the expressiveness of Randomized Signature »
Enea Monzio Compagnoni · Luca Biggio · Antonio Orvieto -
2021 Poster: On the Second-order Convergence Properties of Random Search Methods »
Aurelien Lucchi · Antonio Orvieto · Adamos Solomou -
2021 Poster: Batch Normalization Orthogonalizes Representations in Deep Random Networks »
Hadi Daneshmand · Amir Joudaki · Francis Bach -
2019 Poster: Shadowing Properties of Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: Continuous-time Models for Stochastic Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi