NeurIPS Poster High-probability complexity bounds for stochastic non-convex minimax optimization

Poster

High-probability complexity bounds for stochastic non-convex minimax optimization

Yassine Laguel · Yasa Syed · Necdet Serhat Aybat · Mert Gurbuzbalaban

West Ballroom A-D #6002

[ Abstract ]

[ Paper] [ OpenReview]

Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract: Stochastic smooth nonconvex minimax problems are prevalent in machine learning, e.g., GAN training, fair classification, and distributionally robust learning. Stochastic gradient descent ascent (GDA)-type methods are popular in practice due to their simplicity and single-loop nature. However, there is a significant gap between the theory and practice regarding high-probability complexity guarantees for these methods on stochastic nonconvex minimax problems. Existing high-probability bounds for GDA-type single-loop methods only apply to convex/concave minimax problems and to particular non-monotone variational inequality problems under some restrictive assumptions. In this work, we address this gap by providing the first high-probability complexity guarantees for nonconvex/PL minimax problems corresponding to a smooth function that satisfies the PL-condition in the dual variable. Specifically, we show that when the stochastic gradients are light-tailed, the smoothed alternating GDA method can compute an

ε

$\varepsilon$ -stationary point within

O (\frac{ℓ κ^{2} δ^{2}}{ε^{4}} + \frac{κ}{ε^{2}} (ℓ + δ^{2} \log (1 / \bar{q})))

$\mathcal{O}(\frac{\ell \kappa^2 \delta^2}{\varepsilon^4} + \frac{\kappa}{\varepsilon^2}(\ell+\delta^2\log({1}/{\bar{q}})))$ stochastic gradient calls with probability at least

1 - \bar{q}

$1-\bar{q}$ for any

\bar{q} \in (0, 1)

$\bar{q}\in(0,1)$ , where

μ

$\mu$ is the PL constant,

ℓ

$\ell$ is the Lipschitz constant of the gradient,

κ = ℓ / μ

$\kappa=\ell/\mu$ is the condition number, and

δ^{2}

$\delta^2$ denotes a bound on the variance of stochastic gradients. We also present numerical results on a nonconvex/PL problem with synthetic data and on distributionally robust optimization problems with real data, illustrating our theoretical findings.

Chat is not available.