Parameter-Agnostic Error Feedback Enhanced With Hessian-Corrected Momentum
Abdurakhmon Sadiev · Yury Demidovich · Grigory Malinovsky · Igor Sokolov · SARIT KHIRIRAT · Peter Richtarik
Abstract
Advanced machine learning models often rely on massive datasets distributed across many nodes. To reduce communication overhead in large-scale stochastic optimization, compression is widely used, though it may introduce noise and harm convergence. Error feedback mitigates this by accumulating and reusing compression error, while Hessian-vector products provide variance reduction and improve complexity. Building on these ideas, we design a distributed algorithm for finding $\varepsilon$-stationary points of nonconvex $L$-smooth functions that leverages error feedback, normalization, and second-order momentum. Unlike prior methods requiring problem parameters to tune stepsizes, our algorithm is parameter-agnostic: it uses $\mathcal{O}(1)$ batch size and a time-varying learning rate independent of $L$ and the functional gap. The method achieves $\mathcal{O}(\varepsilon^{-3})$ communication complexity. We prove a lower bound which shows that this rate is optimal. These findings establish the complexity of nonconvex distributed stochastic optimization with higher-order methods.
Chat is not available.
Successful Page Load