Timezone: »
Poster
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
Hongjian Wang · Mert Gurbuzbalaban · Lingjiong Zhu · Umut Simsekli · Murat Erdogdu
Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the $p$-th moment of the noise exists for some $p\in [1,2)$, we first identify a condition on the Hessian, coined `$p$-positive (semi-)definiteness', that leads to an interesting interpolation between the positive semi-definite cone ($p=2$) and the cone of diagonally dominant matrices with non-negative diagonal entries ($p=1$). Under this condition, we provide a convergence rate for the distance to the global optimum in $L^p$. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate $\alpha$-stable random vector.Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function nor to the algorithm itself, as typically required in robust statistics.We demonstrate the implications of our resultsover misspecified models, in the presence of heavy-tailed data.
Author Information
Hongjian Wang (Carnegie Mellon University)
Mert Gurbuzbalaban (Rutgers University)
Lingjiong Zhu (Florida State University)
Umut Simsekli (Inria Paris / ENS)
Murat Erdogdu (University of Toronto)
More from the Same Authors
-
2021 Spotlight: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms »
Alexander Camuto · George Deligiannidis · Murat Erdogdu · Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2022 : Neural Networks Efficiently Learn Low-Dimensional Representations with SGD »
Alireza Mousavi-Hosseini · Sejun Park · Manuela Girotti · Ioannis Mitliagkas · Murat Erdogdu -
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers »
Sejun Park · Umut Simsekli · Murat Erdogdu -
2021 Poster: Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks »
Melih Barsbey · Milad Sefidgaran · Murat Erdogdu · Gaël Richard · Umut Simsekli -
2021 Poster: Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks »
Tolga Birdal · Aaron Lou · Leonidas Guibas · Umut Simsekli -
2021 Poster: An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias »
Lu Yu · Krishnakumar Balasubramanian · Stanislav Volgushev · Murat Erdogdu -
2021 Poster: Manipulating SGD with Data Ordering Attacks »
I Shumailov · Zakhar Shumaylov · Dmitry Kazhdan · Yiren Zhao · Nicolas Papernot · Murat Erdogdu · Ross J Anderson -
2021 Poster: On Empirical Risk Minimization with Dependent and Heavy-Tailed Data »
Abhishek Roy · Krishnakumar Balasubramanian · Murat Erdogdu -
2021 Poster: Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections »
Kimia Nadjahi · Alain Durmus · Pierre E Jacob · Roland Badeau · Umut Simsekli -
2021 Poster: Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms »
Alexander Camuto · George Deligiannidis · Murat Erdogdu · Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2020 Poster: On the Ergodicity, Bias and Asymptotic Normality of Randomized Midpoint Sampling Method »
Ye He · Krishnakumar Balasubramanian · Murat Erdogdu -
2020 Poster: Breaking Reversibility Accelerates Langevin Dynamics for Non-Convex Optimization »
Xuefeng GAO · Mert Gurbuzbalaban · Lingjiong Zhu -
2020 Poster: Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks »
Umut Simsekli · Ozan Sener · George Deligiannidis · Murat Erdogdu -
2020 Spotlight: Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks »
Umut Simsekli · Ozan Sener · George Deligiannidis · Murat Erdogdu -
2019 Poster: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond »
Xuechen (Chen) Li · Denny Wu · Lester Mackey · Murat Erdogdu -
2019 Spotlight: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond »
Xuechen (Chen) Li · Denny Wu · Lester Mackey · Murat Erdogdu -
2018 Poster: Global Non-convex Optimization with Discretized Diffusions »
Murat Erdogdu · Lester Mackey · Ohad Shamir -
2017 Poster: Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding »
Mainak Jas · Tom Dupré la Tour · Umut Simsekli · Alexandre Gramfort -
2017 Poster: Robust Estimation of Neural Signals in Calcium Imaging »
Hakan Inan · Murat Erdogdu · Mark Schnitzer -
2017 Poster: When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent »
Mert Gurbuzbalaban · Asuman Ozdaglar · Pablo A Parrilo · Nuri Vanli -
2017 Spotlight: When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent »
Mert Gurbuzbalaban · Asuman Ozdaglar · Pablo A Parrilo · Nuri Vanli -
2017 Poster: Inference in Graphical Models via Semidefinite Programming Hierarchies »
Murat Erdogdu · Yash Deshpande · Andrea Montanari -
2016 Poster: Scaled Least Squares Estimator for GLMs in Large-Scale Problems »
Murat Erdogdu · Lee H Dicker · Mohsen Bayati -
2015 Poster: Convergence rates of sub-sampled Newton methods »
Murat Erdogdu · Andrea Montanari -
2015 Poster: Newton-Stein Method: A Second Order Method for GLMs via Stein's Lemma »
Murat Erdogdu -
2015 Spotlight: Newton-Stein Method: A Second Order Method for GLMs via Stein's Lemma »
Murat Erdogdu -
2013 Poster: Estimating LASSO Risk and Noise Level »
Mohsen Bayati · Murat Erdogdu · Andrea Montanari