NeurIPS Poster Sharp Analysis of Stochastic Optimization under Global Kurdyka-Lojasiewicz Inequality

Poster

Sharp Analysis of Stochastic Optimization under Global Kurdyka-Lojasiewicz Inequality

Ilyas Fatkhullin · Jalal Etesami · Niao He · Negar Kiyavash

Hall J (level 1) #841

Keywords: [ first order method ] [ Kurdyka-Lojasiewicz condition ] [ Stochastic Optimization ] [ variance reduction ] [ nonconvex optimization ]

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract: We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-{\L}ojasiewicz (KL) inequality and the queries from stochastic gradient oracles satisfy mild expected smoothness assumption. We first introduce a general framework to analyze Stochastic Gradient Descent (SGD) and its associated nonlinear dynamics under the setting. As a byproduct of our analysis, we obtain a sample complexity of

O (ϵ^{- (4 - α) / α})

$\mathcal{O}(\epsilon^{-(4-\alpha)/\alpha})$ for SGD when the objective satisfies the so called

α

$\alpha$ -P{\L} condition, where

α

$\alpha$ is the degree of gradient domination. Furthermore, we show that a modified SGD with variance reduction and restarting (PAGER) achieves an improved sample complexity of

O (ϵ^{- 2 / α})

$\mathcal{O}(\epsilon^{-2/\alpha})$ when the objective satisfies the average smoothness assumption. This leads to the first optimal algorithm for the important case of

α = 1

$\alpha=1$ which appears in applications such as policy optimization in reinforcement learning.

Chat is not available.