Timezone: »
Poster
First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
Thanh Huy Nguyen · Umut Simsekli · Mert Gurbuzbalaban · Gaël RICHARD
Wed Dec 11 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #239
Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using $\alpha$-stable distributions, a family of heavy-tailed distributions that appear in the generalized central limit theorem. In this context, SGD can be viewed as a discretization of a stochastic differential equation (SDE) driven by a L\'{e}vy motion, and the metastability results for this SDE can then be used for illuminating the behavior of SGD, especially in terms of `preferring wide minima'. While this approach brings a new perspective for analyzing SGD, it is limited in the sense that, due to the time discretization, SGD might admit a significantly different behavior than its continuous-time limit. Intuitively, the behaviors of these two systems are expected to be similar to each other only when the discretization step is sufficiently small; however, to the best of our knowledge, there is no theoretical understanding on how small the step-size should be chosen in order to guarantee that the discretized system inherits the properties of the continuous-time system. In this study, we provide formal theoretical analysis where we derive explicit conditions for the step-size such that the metastability behavior of the discrete-time system is similar to its continuous-time limit. We show that the behaviors of the two systems are indeed similar for small step-sizes and we identify how the error depends on the algorithm and problem parameters. We illustrate our results with simulations on a synthetic model and neural networks.
Author Information
Thanh Huy Nguyen (Telecom ParisTech)
Umut Simsekli (Institut Polytechnique de Paris/ University of Oxford)
Mert Gurbuzbalaban (Rutgers)
Gaël RICHARD (Télécom ParisTech)
More from the Same Authors
-
2021 Spotlight: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms »
Alexander Camuto · George Deligiannidis · Murat Erdogdu · Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2021 Poster: Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks »
Melih Barsbey · Milad Sefidgaran · Murat Erdogdu · Gaël Richard · Umut Simsekli -
2021 Poster: Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks »
Tolga Birdal · Aaron Lou · Leonidas Guibas · Umut Simsekli -
2021 Poster: Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance »
Hongjian Wang · Mert Gurbuzbalaban · Lingjiong Zhu · Umut Simsekli · Murat Erdogdu -
2021 Poster: Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections »
Kimia Nadjahi · Alain Durmus · Pierre E Jacob · Roland Badeau · Umut Simsekli -
2021 Poster: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms »
Alexander Camuto · George Deligiannidis · Murat Erdogdu · Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2020 Poster: Breaking Reversibility Accelerates Langevin Dynamics for Non-Convex Optimization »
Xuefeng GAO · Mert Gurbuzbalaban · Lingjiong Zhu -
2020 Poster: IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method »
Yossi Arjevani · Joan Bruna · Bugra Can · Mert Gurbuzbalaban · Stefanie Jegelka · Hongzhou Lin -
2020 Spotlight: IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method »
Yossi Arjevani · Joan Bruna · Bugra Can · Mert Gurbuzbalaban · Stefanie Jegelka · Hongzhou Lin -
2019 Poster: Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance »
Kimia Nadjahi · Alain Durmus · Umut Simsekli · Roland Badeau -
2019 Spotlight: Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance »
Kimia Nadjahi · Alain Durmus · Umut Simsekli · Roland Badeau -
2019 Poster: A Universally Optimal Multistage Accelerated Stochastic Gradient Method »
Necdet Serhat Aybat · Alireza Fallah · Mert Gurbuzbalaban · Asuman Ozdaglar -
2019 Poster: Generalized Sliced Wasserstein Distances »
Soheil Kolouri · Kimia Nadjahi · Umut Simsekli · Roland Badeau · Gustavo Rohde -
2018 Poster: Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC »
Tolga Birdal · Umut Simsekli · Mustafa Onur Eken · Slobodan Ilic -
2017 Poster: Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding »
Mainak Jas · Tom Dupré la Tour · Umut Simsekli · Alexandre Gramfort -
2016 Poster: Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo »
Alain Durmus · Umut Simsekli · Eric Moulines · Roland Badeau · Gaël RICHARD -
2011 Poster: Generalised Coupled Tensor Factorisation »
Kenan Y Yılmaz · Taylan Cemgil · Umut Simsekli