Timezone: »

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
Panayotis Mertikopoulos · Nadav Hallak · Ali Kavis · Volkan Cevher

Wed Dec 09 09:00 AM -- 11:00 AM (PST) @ Poster Session 3 #817
In this paper, we analyze the trajectories of stochastic gradient descent (SGD) with the aim of understanding their convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, we prove that the algorithm's rate of convergence to local minimizers with a positive-definite Hessian is $O(1/n^p)$ if the method is run with a $Θ(1/n^p)$ step-size. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to significant performance gains; we demonstrate this heuristic using ResNet architectures on CIFAR. Finally, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered.

Author Information

Panayotis Mertikopoulos (CNRS (French National Center for Scientific Research) and Criteo AI Lab)
Nadav Hallak (EPFL)
Ali Kavis (EPFL)
Volkan Cevher (EPFL)

More from the Same Authors