Benefits of instability in gradient optimization methods, Peter Bartlett
Peter Bartlett
Abstract
Optimization in modern machine learning relies on simple gradient descent algorithms that are traditionally viewed as time discretizations of a stable differential equation. However, in practice, large step sizes - large enough to cause oscillation of the loss - exhibit performance advantages. This talk will review recent results on gradient descent with logistic loss with a step size large enough that the optimization trajectory is at the "edge of stability." We show the benefits of this initial oscillatory phase in logistic regression, with convex and strongly convex loss landscapes.
Based on joint work with Jingfeng Wu, Pierre Marion, Matus Telgarsky and Bin Yu.
Video
Chat is not available.
Successful Page Load