Skip to yearly menu bar Skip to main content


Poster

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

Anastasiia Koloskova · Sebastian Stich · Martin Jaggi

Hall J (level 1) #821

Keywords: [ Distributed Optimization ] [ Stochastic Optimization ] [ federated learning ] [ delayed SGD ] [ asynchronous SGD ]


Abstract: We study the asynchronous stochastic gradient descent algorithm, for distributed training over n workers that might be heterogeneous. In this algorithm, workers compute stochastic gradients in parallel at their own pace and return them to the server without any synchronization.Existing convergence rates of this algorithm for non-convex smooth objectives depend on the maximum delay τmax and reach an ϵ-stationary point after O(σ2ϵ2+τmaxϵ1) iterations, where σ is the variance of stochastic gradients. In this work (i) we obtain a tighter convergence rate of O(σ2ϵ2+τmaxτavgϵ1) *without any change in the algorithm* where τavg is the average delay, which can be significantly smaller than τmax. We also provide (ii) a simple delay-adaptive learning rate scheme, under which asynchronous SGD achieves a convergence rate of O(σ2ϵ2+τavgϵ1), and does not require any extra hyperparameter tuning nor extra communications. Our result allows to show *for the first time* that asynchronous SGD is *always faster* than mini-batch SGD. In addition, (iii) we consider the case of heterogeneous functions motivated by federated learning applications and improve the convergence rate by proving a weaker dependence on the maximum delay compared to prior works.

Chat is not available.