Poster
Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks
Grant Rotskoff · Eric Vanden-Eijnden
Room 210 #66
Keywords: [ Large Deviations and Asymptotic Analysis ] [ Statistical Physics of Learning ]
[
Abstract
]
Abstract:
The performance of neural networks on high-dimensional data
distributions suggests that it may be possible to parameterize a
representation of a given high-dimensional function with
controllably small errors, potentially outperforming standard
interpolation methods. We demonstrate, both theoretically and
numerically, that this is indeed the case. We map the parameters of
a neural network to a system of particles relaxing with an
interaction potential determined by the loss function. We show that
in the limit that the number of parameters $n$ is large, the
landscape of the mean-squared error becomes convex and the
representation error in the function scales as $O(n^{-1})$.
In this limit, we prove a dynamical variant of the universal
approximation theorem showing that the optimal
representation can be attained by stochastic gradient
descent, the algorithm ubiquitously used for parameter optimization
in machine learning. In the asymptotic regime, we study the
fluctuations around the optimal representation and show that they
arise at a scale $O(n^{-1})$. These fluctuations in the landscape
identify the natural scale for the noise in stochastic gradient
descent. Our results apply to both single and multi-layer neural
networks, as well as standard kernel methods like radial basis
functions.
Live content is unavailable. Log in and register to view live content