Timezone: »
Recent studies have shown that gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the step-size should be chosen sufficiently large, a task which is problem dependent and can be difficult in practice. In this study, we incorporate a chaotic component to GD in a controlled manner, and introduce \emph{multiscale perturbed GD} (MPGD), a novel optimization framework where the GD recursion is augmented with chaotic perturbations that evolve via an independent dynamical system. We analyze MPGD from three different angles: (i) By building up on recent advances in rough paths theory, we show that, under appropriate assumptions, as the step-size decreases, the MPGD recursion converges weakly to a stochastic differential equation (SDE) driven by a heavy-tailed L\'{e}vy-stable process. (ii) By making connections to recently developed generalization bounds for heavy-tailed processes, we derive a generalization bound for the limiting SDE and relate the worst-case generalization error over the trajectories of the process to the parameters of MPGD. (iii) We analyze the implicit regularization effect brought by the dynamical regularization and show that, in the weak perturbation regime, MPGD introduces terms that penalize the Hessian of the loss function. Empirical results are provided to demonstrate the advantages of MPGD.
Author Information
Soon Hoe Lim (Nordita)
Yijun Wan (Ecole Normale Supérieure de Paris)
Umut Simsekli (Inria Paris / ENS)
More from the Same Authors
-
2021 Spotlight: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms »
Alexander Camuto · George Deligiannidis · Murat Erdogdu · Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2022 Poster: Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers »
Sejun Park · Umut Simsekli · Murat Erdogdu -
2021 Poster: Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks »
Melih Barsbey · Milad Sefidgaran · Murat Erdogdu · Gaël Richard · Umut Simsekli -
2021 Poster: Noisy Recurrent Neural Networks »
Soon Hoe Lim · N. Benjamin Erichson · Liam Hodgkinson · Michael Mahoney -
2021 Poster: Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks »
Tolga Birdal · Aaron Lou · Leonidas Guibas · Umut Simsekli -
2021 Poster: Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance »
Hongjian Wang · Mert Gurbuzbalaban · Lingjiong Zhu · Umut Simsekli · Murat Erdogdu -
2021 Poster: Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections »
Kimia Nadjahi · Alain Durmus · Pierre E Jacob · Roland Badeau · Umut Simsekli -
2021 Poster: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms »
Alexander Camuto · George Deligiannidis · Murat Erdogdu · Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2020 Poster: Statistical and Topological Properties of Sliced Probability Divergences »
Kimia Nadjahi · Alain Durmus · Lénaïc Chizat · Soheil Kolouri · Shahin Shahrampour · Umut Simsekli -
2020 Spotlight: Statistical and Topological Properties of Sliced Probability Divergences »
Kimia Nadjahi · Alain Durmus · Lénaïc Chizat · Soheil Kolouri · Shahin Shahrampour · Umut Simsekli -
2020 Poster: Explicit Regularisation in Gaussian Noise Injections »
Alexander Camuto · Matthew Willetts · Umut Simsekli · Stephen J Roberts · Chris C Holmes -
2020 Poster: Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks »
Umut Simsekli · Ozan Sener · George Deligiannidis · Murat Erdogdu -
2020 Poster: Quantitative Propagation of Chaos for SGD in Wide Neural Networks »
Valentin De Bortoli · Alain Durmus · Xavier Fontaine · Umut Simsekli -
2020 Spotlight: Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks »
Umut Simsekli · Ozan Sener · George Deligiannidis · Murat Erdogdu -
2019 Poster: Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance »
Kimia Nadjahi · Alain Durmus · Umut Simsekli · Roland Badeau -
2019 Spotlight: Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance »
Kimia Nadjahi · Alain Durmus · Umut Simsekli · Roland Badeau -
2019 Poster: First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise »
Thanh Huy Nguyen · Umut Simsekli · Mert Gurbuzbalaban · Gaël RICHARD -
2019 Poster: Generalized Sliced Wasserstein Distances »
Soheil Kolouri · Kimia Nadjahi · Umut Simsekli · Roland Badeau · Gustavo Rohde -
2018 Poster: Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC »
Tolga Birdal · Umut Simsekli · Mustafa Onur Eken · Slobodan Ilic -
2017 Poster: Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding »
Mainak Jas · Tom Dupré la Tour · Umut Simsekli · Alexandre Gramfort -
2016 Poster: Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo »
Alain Durmus · Umut Simsekli · Eric Moulines · Roland Badeau · Gaël RICHARD -
2011 Poster: Generalised Coupled Tensor Factorisation »
Kenan Y Yılmaz · Taylan Cemgil · Umut Simsekli