Exploring Generalization in Convex Optimization, Tomer Koren
Abstract
A classic and fundamental result in convex optimization, dating back to Nemirovski and Yudin (1983), is that one-pass stochastic gradient descent (SGD) converges optimally in terms of population excess risk for any convex Lipschitz objective, independently of the problem dimension. Modernly, this is often viewed as a direct consequence of regret minimization through an online-to-batch conversion. In this talk, I will discuss several basic questions concerning this seminal result:
1. What form of capacity control enables this optimal out-of-sample performance of SGD?
2. At what rate does SGD minimize the empirical risk?
3. What happens beyond the first pass? How quickly does overfitting occur (if at all)?
I will present several recent results that shed light on these questions and reveal intriguing phenomena in the generalization behavior of optimization methods within the classical convex setting, and will highlight connections to the ongoing discussion of generalization in overparameterized regimes.