Timezone: »

On the Power of Differentiable Learning versus PAC and SQ Learning
Emmanuel Abbe · Pritish Kamath · Eran Malach · Colin Sandon · Nathan Srebro

@ None
We study the power of learning via mini-batch stochastic gradient descent (SGD) on the loss of a differentiable model or neural network, and ask what learning problems can be learnt using this paradigm. We show that SGD can always simulate learning with statistical queries (SQ), but its ability to go beyond that depends on the precision $\rho$ of the gradients and the minibatch size $b$. With fine enough precision relative to minibatch size, namely when $b \rho$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$. Moreover, with polynomially many bits of precision (i.e. when $\rho$ is exponentially small), SGD can simulate PAC learning regardless of the batch size. On the other hand, when $b \rho^2$ is large enough, the power of SGD is equivalent to that of SQ learning.

Author Information

Emmanuel Abbe (Swiss Federal Institute of Technology Lausanne)
Pritish Kamath (Toyota Technological Institute at Chicago)
Eran Malach (Hebrew University Jerusalem Israel)
Colin Sandon (MIT)
Nathan Srebro (University of Toronto)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors