Skip to yearly menu bar Skip to main content


Surprising Instabilities in Training Deep Networks and a Theoretical Analysis

Yuxin Sun · DONG LAO · Ganesh Sundaramoorthi · Anthony Yezzi

Hall J (level 1) #504

Keywords: [ Deep Learning Theory, Stability Analysis ]


We empirically demonstrate numerical instabilities in training standard deep networks with SGD. Specifically, we show numerical error (on the order of the smallest floating point bit) induced from floating point arithmetic in training deep nets can be amplified significantly and result in significant test accuracy variance, comparable to the test accuracy variance due to stochasticity in SGD. We show how this is likely traced to instabilities of the optimization dynamics that are localized over iterations and regions of the weight tensor space. We do this by presenting a theoretical framework using numerical analysis of partial differential equations (PDE), and analyzing the gradient descent PDE of a one-layer convolutional neural network, which is sufficient to illustrate these instabilities. We show that it is stable only under certain conditions on the learning rate and weight decay. We reproduce the localized instabilities in the PDE for the one-layer network, which arise when the conditions are violated.

Chat is not available.