`

Timezone: »

Contributed Video: PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization, Zhize Li
Zhize Li

Fri Dec 11 05:00 AM -- 05:30 AM (PST) @ None

In this paper, we propose a novel stochastic gradient estimator---ProbAbilistic Gradient Estimator (PAGE)---for nonconvex optimization. PAGE is easy to implement as it is designed via a small adjustment to vanilla SGD: in each iteration, PAGE uses the vanilla minibatch SGD update with probability $p$ and reuses the previous gradient with a small adjustment, at a much lower computational cost, with probability $1-p$. We give a simple formula for the optimal choice of $p$. We prove tight lower bounds for nonconvex problems, which are of independent interest. Moreover, we prove matching upper bounds both in the finite-sum and online regimes, which establish that PAGE is an optimal method. Besides, we show that for nonconvex functions satisfying the Polyak-\L ojasiewicz (PL) condition, PAGE can automatically switch to a faster linear convergence rate. Finally, we conduct several deep learning experiments (e.g., LeNet, VGG, ResNet) on real datasets in PyTorch, and the results demonstrate that PAGE not only converges much faster than SGD in training but also achieves the higher test accuracy, validating our theoretical results and confirming the practical superiority of PAGE.

#### Author Information

##### Zhize Li (Tsinghua University, and KAUST)

Zhize Li is a Research Scientist at the King Abdullah University of Science and Technology (KAUST) since September 2020. He obtained his PhD degree in Computer Science from Tsinghua University in 2019 (Advisor: Prof. Jian Li). He was a postdoc at KAUST (Hosted by Prof. Peter Richtárik), a visiting scholar at Duke University (Hosted by Prof. Rong Ge), and a visiting scholar at Georgia Institute of Technology (Hosted by Prof. Guanghui (George) Lan).