Skip to yearly menu bar Skip to main content

Workshop: Heavy Tails in ML: Structure, Stability, Dynamics

Revisiting the noise Model of SGD

Barak Battash · Ofir Lindenbaum

Keywords: [ stochastic gradient noise (SGN) ] [ Levy noise ] [ stochastic gradient descent (SGD) ]


The effectiveness of stochastic gradient descent (SGD) is significantly influenced by stochastic gradient noise (SGN). Following the central limit theorem, stochastic gradient noise (SGN) was initially described as Gaussian, but recently, Simsekli et al. demonstrated that SαS Lévy better characterizes the stochastic gradient noise. Here, we revisit the noise model of SGD and provide robust, comprehensive empirical evidence that SGN is heavy-tailed and is better represented by the SαS distribution. Furthermore, we argue that different deep neural network (DNN) parameters preserve distinct SGN properties throughout training. We develop a novel framework based on Lévy-driven stochastic differential equation (SDE), where one-dimensional Lévy processes describe each DNN parameter. This leads to a more accurate characterization of the dynamics of SGD around local minima.

Chat is not available.