Skip to yearly menu bar Skip to main content


Poster

Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

Dongruo Zhou · Pan Xu · Quanquan Gu

Room 210 #44

Keywords: [ Non-Convex Optimization ]


Abstract: We study finite-sum nonconvex optimization problems, where the objective function is an average of n nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each epoch, our algorithm uses K+1 nested reference points to build an semi-stochastic gradient to further reduce its variance in each epoch. For smooth functions, the proposed algorithm converges to an approximate first order stationary point (i.e., F(\xb)2ϵ) within \tO(nϵ2+ϵ3n1/2ϵ2)\footnote{\tO() hides the logarithmic factors} number of stochastic gradient evaluations, where n is the number of component functions, and ϵ is the optimization error. This improves the best known gradient complexity of SVRG O(n+n2/3ϵ2) and the best gradient complexity of SCSG O(ϵ5/3n2/3ϵ2). For gradient dominated functions, our algorithm achieves \tO(nτϵ1+τ(n1/2(τϵ1)1/2) gradient complexity, which again beats the existing best gradient complexity \tO(nτϵ1+τ(n1/2(τϵ1)2/3) achieved by SCSG. Thorough experimental results on different nonconvex optimization problems back up our theory.

Live content is unavailable. Log in and register to view live content