Poster
Uniform convergence may be unable to explain generalization in deep learning
Vaishnavh Nagarajan · J. Zico Kolter
East Exhibition Hall B, C #229
Keywords: [ Learning Theory ] [ Theory ] [ Regularization ] [ Deep Learning; Theory; Theory ]
Outstanding New Directions Paper |
Abstract:
Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence. While
it is well-known that many of these existing bounds are numerically large, through numerous experiments, we bring to light a more concerning aspect of these bounds:
in practice, these bounds can {\em increase} with the training dataset size. Guided by our observations,
we then present examples of overparameterized linear classifiers and neural networks trained by gradient descent (GD) where uniform convergence provably cannot ``explain generalization'' -- even if we take into account the implicit bias of GD {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by GD, which have test errors less than some small $\epsilon$ in our settings, we show that applying (two-sided) uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than $1-\epsilon$. Through these findings,
we cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.
Live content is unavailable. Log in and register to view live content