Stochastic gradient algorithms are widely used for large-scale learning and inference problems. However, their use in practice is typically guided by heuristics and trial-and-error rather than rigorous, generalizable theory. We take a step toward better understanding the effect of the tuning parameters of these algorithms by characterizing the large-sample behavior of iterates of a very general class of preconditioned stochastic gradient algorithms with fixed step size, including stochastic gradient descent with and without additional Gaussian noise, momentum, and/or acceleration. We show that near a local optimum, the iterates converge weakly to paths of an Ornstein–Uhlenbeck process, and provide sufficient conditions for the stationary distributions of the finite-sample processes to converge weakly to that of the limiting process. In particular, with appropriate choices of tuning parameters, the limiting stationary covariance can match either the Bernstein–von Mises-limit of the posterior, adjustments to the posterior for model misspecification, or the asymptotic distribution of the maximum likelihood estimate – and that with a naive tuning, the limit corresponds to none of these. Moreover, we argue that, in the large-sample regime, an essentially independent sample from the stationary distribution can be obtained after a fixed number of passes over the dataset. Our results show that properly tuned stochastic gradient algorithms offer a practical approach to obtaining inferences that are computationally efficient and statistically robust.
Speaker Bio: Jonathan Huggins is an Assistant Professor in the Department of Mathematics & Statistics, a Data Science Faculty Fellow, and a Founding Member of the Faculty of Computing & Data Sciences at Boston University. Prior to joining BU, he was a Postdoctoral Research Fellow in the Department of Biostatistics at Harvard. He completed his Ph.D. in Computer Science at the Massachusetts Institute of Technology in 2018. Previously, he received a B.A. in Mathematics from Columbia University and an S.M. in Computer Science from the Massachusetts Institute of Technology. His research centers on the development of fast, trustworthy machine learning and AI methods that balance the need for computational efficiency and the desire for statistical optimality with the inherent imperfections that come from real-world problems, large datasets, and complex models. His current applied work is focused on methods to enable more effective scientific discovery from high-throughput and multi-modal genomic data.