Timezone: »

Neural Networks Learning and Memorization with (almost) no Over-Parameterization
Amit Daniely

Thu Dec 10 09:00 PM -- 11:00 PM (PST) @ Poster Session 6 #1732
Many results in recent years established polynomial time learnability of various models via neural networks algorithms (e.g. \cite{andoni2014learning, daniely2016toward, daniely2017sgd, cao2019generalization, ziwei2019polylogarithmic, zou2019improved, ma2019comparative, du2018gradient, arora2019fine, song2019quadratic, oymak2019towards, ge2019mildly, brutzkus2018sgd}). However, unless the model is linear separable~\cite{brutzkus2018sgd}, or the activation is a polynomial~\cite{ge2019mildly}, these results require very large networks -- much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with {\em near optimal} network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with $\tilde{O}\left(\frac{m}{d}\right)$ hidden neurons (and hence $\tilde{O}(m)$ parameters) can memorize $m$ random labeled points in $\sphere^{d-1}$.

Author Information

Amit Daniely (Hebrew University and Google Research)

More from the Same Authors