NeurIPS Poster On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

Poster

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions

Yusu Hong · Junhong Lin

West Ballroom A-D #6108

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: In this paper, we study Adam in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. We consider a general noise model which governs affine variance noise, bounded noise, and sub-Gaussian noise. We show that Adam with a specific hyper-parameter setup can find a stationary point with a

O (poly (\log T) / \sqrt{T})

$\mathcal{O}(\text{poly}(\log T)/\sqrt{T})$ rate in high probability under this general noise model where

T

$T$ denotes total number iterations, matching the lower rate of stochastic first-order algorithms up to logarithm factors. We also provide a probabilistic convergence result for Adam under a generalized smooth condition which allows unbounded smoothness parameters and has been illustrated empirically to capture the smooth property of many practical objective functions more accurately.

Chat is not available.