NeurIPS Poster Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Poster

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Yuan Cao · Quanquan Gu

East Exhibition Hall B, C #141

Keywords: [ Deep Learning ] [ Learning Theory ] [ Theory ]

[ Abstract ]

Abstract: We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected

$0$ -

$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a \textit{neural tangent random feature} (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of

$\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

Live content is unavailable. Log in and register to view live content