Timezone: »
Poster
Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization
Dongruo Zhou · Pan Xu · Quanquan Gu
We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each epoch, our algorithm uses $K+1$ nested reference points to build an semi-stochastic gradient to further reduce its variance in each epoch. For smooth functions, the proposed algorithm converges to an approximate first order stationary point (i.e., $\|\nabla F(\xb)\|_2\leq \epsilon$) within $\tO(n\land \epsilon^{-2}+\epsilon^{-3}\land n^{1/2}\epsilon^{-2})$\footnote{$\tO(\cdot)$ hides the logarithmic factors} number of stochastic gradient evaluations, where $n$ is the number of component functions, and $\epsilon$ is the optimization error. This improves the best known gradient complexity of SVRG $O(n+n^{2/3}\epsilon^{-2})$ and the best gradient complexity of SCSG $O(\epsilon^{-5/3}\land n^{2/3}\epsilon^{-2})$. For gradient dominated functions, our algorithm achieves $\tO(n\land \tau\epsilon^{-1}+\tau\cdot (n^{1/2}\land (\tau\epsilon^{-1})^{1/2})$ gradient complexity, which again beats the existing best gradient complexity $\tO(n\land \tau\epsilon^{-1}+\tau\cdot (n^{1/2}\land (\tau\epsilon^{-1})^{2/3})$ achieved by SCSG. Thorough experimental results on different nonconvex optimization problems back up our theory.
Author Information
Dongruo Zhou (UCLA)
Pan Xu (UCLA)
Quanquan Gu (UCLA)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Spotlight: Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization »
Wed Dec 5th 09:05 -- 09:10 PM Room Room 517 CD
More from the Same Authors
-
2018 Poster: Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima »
Yaodong Yu · Pan Xu · Quanquan Gu -
2018 Poster: Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization »
Pan Xu · Jinghui Chen · Difan Zou · Quanquan Gu -
2018 Spotlight: Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization »
Pan Xu · Jinghui Chen · Difan Zou · Quanquan Gu -
2018 Poster: Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization »
Bargav Jayaraman · Lingxiao Wang · David Evans · Quanquan Gu -
2017 Poster: Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization »
Pan Xu · Jian Ma · Quanquan Gu -
2016 Poster: Semiparametric Differential Graph Models »
Pan Xu · Quanquan Gu -
2015 Poster: High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality »
Zhaoran Wang · Quanquan Gu · Yang Ning · Han Liu -
2014 Poster: Sparse PCA with Oracle Property »
Quanquan Gu · Zhaoran Wang · Han Liu -
2014 Poster: Robust Tensor Decomposition with Gross Corruption »
Quanquan Gu · Huan Gui · Jiawei Han -
2012 Poster: Selective Labeling via Error Bound Minimization »
Quanquan Gu · Tong Zhang · Chris Ding · Jiawei Han