Poster
in
Workshop: OPT 2025: Optimization for Machine Learning

Parameter-Agnostic Error Feedback Enhanced With Hessian-Corrected Momentum

Abdurakhmon Sadiev · Yury Demidovich · Grigory Malinovsky · Igor Sokolov · SARIT KHIRIRAT · Peter Richtarik

Project Page [ OpenReview]

Abstract

Advanced machine learning models often rely on massive datasets distributed across many nodes. To reduce communication overhead in large-scale stochastic optimization, compression is widely used, though it may introduce noise and harm convergence. Error feedback mitigates this by accumulating and reusing compression error, while Hessian-vector products provide variance reduction and improve complexity. Building on these ideas, we design a distributed algorithm for finding $\varepsilon$-stationary points of nonconvex $L$-smooth functions that leverages error feedback, normalization, and second-order momentum. Unlike prior methods requiring problem parameters to tune stepsizes, our algorithm is parameter-agnostic: it uses $\mathcal{O}(1)$ batch size and a time-varying learning rate independent of $L$ and the functional gap. The method achieves $\mathcal{O}(\varepsilon^{-3})$ communication complexity. We prove a lower bound which shows that this rate is optimal. These findings establish the complexity of nonconvex distributed stochastic optimization with higher-order methods.

Chat is not available.