NeurIPS Poster Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

Poster

Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

Wei Jiang · Sifan Yang · Wenhao Yang · Lijun Zhang

West Ballroom A-D #6011

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of

O (d^{1 / 2} T^{- 1 / 4})

$\mathcal{O}(d^{1/2}T^{-1/4})$ , where

d

$d$ represents the dimension and

T

$T$ is the iteration number. In this paper, we improve this convergence rate to

O (d^{1 / 2} T^{- 1 / 3})

$\mathcal{O}(d^{1/2}T^{-1/3})$ by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of

O (m^{1 / 4} d^{1 / 2} T^{- 1 / 2})

$\mathcal{O}(m^{1/4}d^{1/2}T^{-1/2})$ , where

m

$m$ denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel algorithms that attain improved convergence rates of

O (d^{1 / 2} T^{- 1 / 2} + d n^{- 1 / 2})

$\mathcal{O}(d^{1/2}T^{-1/2} + dn^{-1/2})$ and

O (d^{1 / 4} T^{- 1 / 4})

$\mathcal{O}(d^{1/4}T^{-1/4})$ respectively, outperforming the previous results of

O (d T^{- 1 / 4} + d n^{- 1 / 2})

$\mathcal{O}(dT^{-1/4} + dn^{-1/2})$ and

O (d^{3 / 8} T^{- 1 / 8})

$\mathcal{O}(d^{3/8}T^{-1/8})$ , where

n

$n$ represents the number of nodes. Numerical experiments across different tasks validate the effectiveness of our proposed methods.

Chat is not available.