LOTION: Smoothing the Optimization Landscape for Quantized Training
Abstract
Optimizing neural networks to minimize quantized loss is difficult asthe quantized loss surface is discontinuous. Most previous methodsdeal with this issue by relaxing gradient computations by usingtechniques like Straight Through Estimators (STE). However, thesealgorithms do not provide any guarantees of convergence. In this work,taking inspiration from Nesterov smoothing, we relax theloss function by approximating the quantized loss surface with asmoothed loss, where we consider an expected quantized lossafter perturbing the weights with random noise.In particular, we introduce LOTION, a principled smoothing framework that replaces the raw quantized loss with its expectation under unbiased stochastic-rounding noise. In this framework, standardoptimizers are guaranteed to converge to a local minimum of thesmoothed loss surface. Moreover, when using noise derived fromstochastic rounding, we show that the global minima of the originalquantized loss are preserved. We empirically demonstrate that thismethod outperforms QAT on synthetic testbeds and in large language model experiments.