NeurIPS Poster Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions

Poster

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions

Saeed Masiha · Saber Salehkaleybar · Niao He · Negar Kiyavash · Patrick Thiran

Hall J (level 1) #826

Keywords: [ Gradient-dominated functions ] [ Reinforcement Learning ] [ Stochastic Optimization ] [ second-order methods ]

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract: We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with

$1\le\alpha\le2$ which holds in a wide range of applications in machine learning and signal processing. This condition ensures that any first-order stationary point is a global optimum. We prove that the total sample complexity of SCRN in achieving

$\epsilon$ -global optimum is

$\mathcal{O}(\epsilon^{-7/(2\alpha)+1})$ for

$1\le\alpha< 3/2$ and

$\mathcal{\tilde{O}}(\epsilon^{-2/(\alpha)})$ for

$3/2\le\alpha\le 2$ . SCRN improves the best-known sample complexity of stochastic gradient descent. Even under a weak version of gradient dominance property, which is applicable to policy-based reinforcement learning (RL), SCRN achieves the same improvement over stochastic policy gradient methods. Additionally, we show that the average sample complexity of SCRN can be reduced to

${\mathcal{O}}(\epsilon^{-2})$ for

$\alpha=1$ using a variance reduction method with time-varying batch sizes. Experimental results in various RL settings showcase the remarkable performance of SCRN compared to first-order methods.

Chat is not available.