Timezone: »
We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching.
Author Information
Antonio Orvieto (ETH Zurich)
PhD Student at ETH Zurich. I’m interested in the design and analysis of optimization algorithms for deep learning. Interned at DeepMind, MILA, and Meta. All publications at http://orvi.altervista.org/ Looking for postdoc positions! :) antonio.orvieto@inf.ethz.ch
Aurelien Lucchi (ETH Zurich)
More from the Same Authors
-
2022 : Batch size selection by stochastic optimal contro »
Jim Zhao · Aurelien Lucchi · Frank Proske · Antonio Orvieto · Hans Kersting -
2022 : Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning »
Sanghwan Kim · Lorenzo Noci · Antonio Orvieto · Thomas Hofmann -
2022 Poster: On the Theoretical Properties of Noise Correlation in Stochastic Optimization »
Aurelien Lucchi · Frank Proske · Antonio Orvieto · Francis Bach · Hans Kersting -
2022 Poster: Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse »
Lorenzo Noci · Sotiris Anagnostidis · Luca Biggio · Antonio Orvieto · Sidak Pal Singh · Aurelien Lucchi -
2022 Poster: Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution »
Antonio Orvieto · Simon Lacoste-Julien · Nicolas Loizou -
2021 : Empirics on the expressiveness of Randomized Signature »
Enea Monzio Compagnoni · Luca Biggio · Antonio Orvieto -
2021 Poster: Rethinking the Variational Interpretation of Accelerated Optimization Methods »
Peiyuan Zhang · Antonio Orvieto · Hadi Daneshmand -
2021 Poster: On the Second-order Convergence Properties of Random Search Methods »
Aurelien Lucchi · Antonio Orvieto · Adamos Solomou -
2020 Poster: Batch normalization provably avoids ranks collapse for randomly initialised deep networks »
Hadi Daneshmand · Jonas Kohler · Francis Bach · Thomas Hofmann · Aurelien Lucchi -
2020 Poster: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2020 Oral: Convolutional Generation of Textured 3D Meshes »
Dario Pavllo · Graham Spinks · Thomas Hofmann · Marie-Francine Moens · Aurelien Lucchi -
2019 : Spotlight talks »
Paul Grigas · Zhewei Yao · Aurelien Lucchi · Si Yi Meng -
2019 Poster: Shadowing Properties of Optimization Algorithms »
Antonio Orvieto · Aurelien Lucchi -
2019 Poster: A Domain Agnostic Measure for Monitoring and Evaluating GANs »
Paulina Grnarova · Kfir Y. Levy · Aurelien Lucchi · Nathanael Perraudin · Ian Goodfellow · Thomas Hofmann · Andreas Krause -
2017 Poster: Stabilizing Training of Generative Adversarial Networks through Regularization »
Kevin Roth · Aurelien Lucchi · Sebastian Nowozin · Thomas Hofmann -
2016 Poster: Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy »
Aryan Mokhtari · Hadi Daneshmand · Aurelien Lucchi · Thomas Hofmann · Alejandro Ribeiro -
2015 Poster: Variance Reduced Stochastic Gradient Descent with Neighbors »
Thomas Hofmann · Aurelien Lucchi · Simon Lacoste-Julien · Brian McWilliams