Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization
Dongruo Zhou · Pan Xu · Quanquan Gu

Wed Dec 5th 05:00 -- 07:00 PM @ Room 210 #44
We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each epoch, our algorithm uses $K+1$ nested reference points to build an semi-stochastic gradient to further reduce its variance in each epoch. For smooth functions, the proposed algorithm converges to an approximate first order stationary point (i.e., $\|\nabla F(\xb)\|_2\leq \epsilon$) within $\tO(n\land \epsilon^{-2}+\epsilon^{-3}\land n^{1/2}\epsilon^{-2})$\footnote{$\tO(\cdot)$ hides the logarithmic factors} number of stochastic gradient evaluations, where $n$ is the number of component functions, and $\epsilon$ is the optimization error. This improves the best known gradient complexity of SVRG $O(n+n^{2/3}\epsilon^{-2})$ and the best gradient complexity of SCSG $O(\epsilon^{-5/3}\land n^{2/3}\epsilon^{-2})$. For gradient dominated functions, our algorithm achieves $\tO(n\land \tau\epsilon^{-1}+\tau\cdot (n^{1/2}\land (\tau\epsilon^{-1})^{1/2})$ gradient complexity, which again beats the existing best gradient complexity $\tO(n\land \tau\epsilon^{-1}+\tau\cdot (n^{1/2}\land (\tau\epsilon^{-1})^{2/3})$ achieved by SCSG. Thorough experimental results on different nonconvex optimization problems back up our theory.

Author Information

Dongruo Zhou (UCLA)
Pan Xu (UCLA)
Quanquan Gu (UCLA)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors