Poster
in
Workshop: OPT 2023: Optimization for Machine Learning
Noise-adaptive (Accelerated) Stochastic Heavy-Ball Momentum
Anh Dang · Reza Babanezhad Harikandeh · Sharan Vaswani
Abstract:
We analyze the convergence of stochastic heavy ball (SHB) momentum in the smooth, strongly-convex setting. Kidambi et al. showed that SHB (with small mini-batches) cannot attain an accelerated rate of convergence even for quadratics, and conjecture that the practical gains of SHB are a by-product of mini-batching. We substantiate this claim by showing that SHB can obtain an accelerated rate when the mini-batch size is larger than some threshold. In particular, for strongly-convex quadratics with condition number , we prove that SHB with the standard step-size and momentum parameters results in an convergence rate, where is the number of iterations and is the variance in the stochastic gradients. To ensure convergence to the minimizer, we propose a multi-stage approach that results in a noise-adaptive rate. For general strongly-convex functions, we use the averaging interpretation of SHB along with exponential step-sizes to prove an convergence to the minimizer. Finally, we empirically demonstrate the effectiveness of the proposed algorithms.
Chat is not available.