Timezone: »

Poster
Network size and size of the weights in memorization with two-layers neural networks
Sebastien Bubeck · Ronen Eldan · Yin Tat Lee · Dan Mikulincer

Tue Dec 08 09:00 AM -- 11:00 AM (PST) @ Poster Session 1 #392
In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of $n$ points in general position in $\R^d$ using only $\ulcorner n/d \urcorner$ neurons. We observe that with ReLU networks, using four times as many neurons one can fit arbitrary real labels. Moreover, for approximate memorization up to error $\epsilon$, the neural tangent kernel can also memorize with only $O\left(\frac{n}{d} \cdot \log(1/\epsilon) \right)$ neurons (assuming that the data is well dispersed too). We show however that these constructions give rise to networks where the \emph{magnitude} of the neurons' weights are far from optimal. In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.