NeurIPS Poster Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization

Poster

Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization

Feihu Huang · Shangqian Gao · Jian Pei · Heng Huang

Hall J (level 1) #1007

[ Abstract ]

[ Slides] [ Poster]

[Paper PDF]

Abstract: In the paper, we propose a class of accelerated zeroth-order and first-order momentum methods for both nonconvex mini-optimization and minimax-optimization. Specifically, we propose a new accelerated zeroth-order momentum (Acc-ZOM) method for black-box mini-optimization where only function values can be obtained. Moreover, we prove that our Acc-ZOM method achieves a lower query complexity of

~ O (d^{3 / 4} ϵ^{- 3})

$\tilde{O}(d^{3/4}\epsilon^{-3})$ for finding an

ϵ

$\epsilon$ -stationary point, which improves the best known result by a factor of

O (d^{1 / 4})

$O(d^{1/4})$ where

d

$d$ denotes the variable dimension. In particular, our Acc-ZOM does not need large batches required in the existing zeroth-order stochastic algorithms. Meanwhile, we propose an accelerated zeroth-order momentum descent ascent (Acc-ZOMDA) method for black-box minimax optimization, where only function values can be obtained. Our Acc-ZOMDA obtains a low query complexity of

~ O ((d_{1} + d_{2})^{3 / 4} κ_{y}^{4.5} ϵ^{- 3})

$\tilde{O}((d_1+d_2)^{3/4}\kappa_y^{4.5}\epsilon^{-3})$ without requiring large batches for finding an

ϵ

$\epsilon$ -stationary point, where

d_{1}

$d_1$ and

d_{2}

$d_2$ denote variable dimensions and

κ_{y}

$\kappa_y$ is condition number. Moreover, we propose an accelerated first-order momentum descent ascent (Acc-MDA) method for minimax optimization, whose explicit gradients are accessible. Our Acc-MDA achieves a low gradient complexity of

~ O (κ_{y}^{4.5} ϵ^{- 3})

$\tilde{O}(\kappa_y^{4.5}\epsilon^{-3})$ without requiring large batches for finding an

ϵ

$\epsilon$ -stationary point. In particular, our Acc-MDA can obtain a lower gradient complexity of

~ O (κ_{y}^{2.5} ϵ^{- 3})

$\tilde{O}(\kappa_y^{2.5}\epsilon^{-3})$ with a batch size

O (κ_{y}^{4})

$O(\kappa_y^4)$ , which improves the best known result by a factor of

O (κ_{y}^{1 / 2})

$O(\kappa_y^{1/2})$ . Extensive experimental results on black-box adversarial attack to deep neural networks and poisoning attack to logistic regression demonstrate efficiency of our algorithms.

Chat is not available.