Poster
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
Chulhee Yun · Suvrit Sra · Ali Jadbabaie
East Exhibition Hall B, C #233
Keywords: [ Learning Theory ] [ Theory ] [ Optimization for Deep Networks; Theory ] [ Deep Learning ]
[
Abstract
]
Abstract:
We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require NN hidden nodes to memorize/interpolate arbitrary NN data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with Ω(√N)Ω(√N) hidden nodes can perfectly memorize most datasets with NN points. We also prove that width Θ(√N)Θ(√N) is necessary and sufficient for memorizing NN data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an LL-layer network with WW parameters in the hidden layers can memorize NN data points if W=Ω(N)W=Ω(N). Combined with a recent upper bound O(WLlogW)O(WLlogW) on VC dimension, our construction is nearly tight for any fixed LL. Subsequently, we analyze memorization capacity of residual networks under a general position assumption; we prove results that substantially reduce the known requirement of NN hidden nodes. Finally, we study the dynamics of stochastic gradient descent (SGD), and show that when initialized near a memorizing global minimum of the empirical risk, SGD quickly finds a nearby point with much smaller empirical risk.
Live content is unavailable. Log in and register to view live content