Poster
How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD
Zeyuan Allen-Zhu
Room 210 #74
Keywords: [ Online Learning ] [ Stochastic Methods ]
[
Abstract
]
Abstract:
Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives . However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when is convex.
If is convex, to find a point with gradient norm , we design an algorithm SGD3 with a near-optimal rate , improving the best known rate . If is nonconvex, to find its -approximate local minimum, we design an algorithm SGD5 with rate , where previously SGD variants only achieve . This is no slower than the best known stochastic version of Newton's method in all parameter regimes.
Live content is unavailable. Log in and register to view live content