Moderator: Sham Kakade
Deep learning has revealed some major surprises from the perspective of statistical complexity: even without any explicit effort to control model complexity, these methods find prediction rules that give a near-perfect fit to noisy training data and yet exhibit excellent prediction performance in practice. This talk surveys work on methods that predict accurately in probabilistic settings despite fitting too well to training data. We present a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. We discuss implications for robustness to adversarial examples, and we describe extensions to ridge regression and barriers to analyzing benign overfitting via model-dependent generalization bounds.