Skip to yearly menu bar Skip to main content

Workshop: Machine Learning with New Compute Paradigms

Enhancing Low-Precision Sampling via Stochastic Gradient Hamiltonian Monte Carlo

Ziyi Wang · Yujie Chen · Ruqi Zhang · Qifan Song

[ ] [ Project Page ]
Sat 16 Dec 9:25 a.m. PST — 10:30 a.m. PST

Abstract: Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy.Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy.This paper investigates low-precision samplers via Stochastics Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and full-precision gradients accumulators for both strongly log-concave and non-log-concave distributions.Theoretically, our results show that, to achieve $\epsilon$-error in the 2-Wasserstein distance for non-log-concave distributions, low-precision SGHMC achieves quadratic improvement ($\tilde{\mathcal{O}}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}}\right)}\right)$) compared to the state-of-the-art low-precision sampler, Stochastic Gradient Langevin Dynamics (SGLD) ($\tilde{\mathcal{O}}\left({{\epsilon}^{-4}{\lambda^{*}}^{-1}\log^5\left({\epsilon^{-1}}\right)}\right)$). Moreover, we prove that low-precision SGHMC is more robust to the quantization error compared to low-precision SGLD due to the robustness of the momentum-based update w.r.t. gradient noise. Empirically, we conduct experiments on synthetic and MNIST, CIFAR-10 \& CIFAR-100 datasets which successfully validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited deep learning.

Chat is not available.