Poster
Generalization Bounds for Neural Networks via Approximate Description Length
Amit Daniely · Elad Granot
East Exhibition Hall B, C #228
Keywords: [ Learning Theory ] [ Theory ]
[
Abstract
]
Abstract:
We investigate the sample complexity of networks with bounds on the magnitude of its weights.
In particular, we consider the class
where the spectral norm of each is bounded by , the Frobenius norm is bounded by , and is the sigmoid function or the smoothened ReLU function .
We show that for any depth , if the inputs are in , the sample complexity of is . This bound is optimal up to log-factors, and substantially improves over the previous state of the art of , that was established in a recent line of work.
We furthermore show that this bound remains valid if instead of considering the magnitude of the 's, we consider the magnitude of , where are some reference matrices, with spectral norm of . By taking the to be the matrices in the onset of the training process, we get sample complexity bounds that are sub-linear in the number of parameters, in many {\em typical} regimes of parameters.
To establish our results we develop a new technique to analyze the sample complexity of families of predictors.
We start by defining a new notion of a randomized approximate description of functions . We then show that if there is a way to approximately describe functions in a class using bits, then examples suffices to guarantee uniform convergence. Namely, that the empirical loss of all the functions in the class is -close to the true loss. Finally, we develop a set of tools for calculating the approximate description length of classes of functions that can be presented as a composition of linear function classes and non-linear functions.
Live content is unavailable. Log in and register to view live content