NeurIPS Poster Escape saddle points by a simple gradient-descent based algorithm

Poster

Escape saddle points by a simple gradient-descent based algorithm

Chenyi Zhang · Tongyang Li

Virtual

[ Abstract ]

[ Slides] [ OpenReview]

Abstract: Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function

$f\colon\mathbb{R}^n\to\mathbb{R}$ , it outputs an

$\epsilon$ -approximate second-order stationary point in

$\tilde{O}(\log n/\epsilon^{1.75})$ iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with

$\tilde{O}(\log^4 n/\epsilon^{2})$ or

$\tilde{O}(\log^6 n/\epsilon^{1.75})$ iterations, our algorithm is polynomially better in terms of

$\log n$ and matches their complexities in terms of

$1/\epsilon$ . For the stochastic setting, our algorithm outputs an

$\epsilon$ -approximate second-order stationary point in

$\tilde{O}(\log^{2} n/\epsilon^{4})$ iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in

$\log n$ compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.

Chat is not available.