Timezone: »
Natural environments have temporal structure at multiple timescales, a property that is reflected in biological learning and memory but typically not in machine learning systems. This paper advances a multiscale learning model in which each weight in a neural network is a sum of subweights learning independently at different timescales. A special case of this model is a fast-weights scheme, in which each original weight is augmented with a fast weight that rapidly learns and decays, enabling adaptation to distribution shifts during online learning. We then prove that more complicated models that assume coupling between timescales are equivalent to the multiscale learner, via a reparameterization that eliminates the coupling. Finally, we prove that momentum learning is equivalent to fast weights with a negative learning rate, offering a new perspective on how and when momentum is beneficial.
Author Information
Matt Jones (Google Brain)
More from the Same Authors
-
2022 : Neural Network Online Training with Sensitivity to Multiscale Temporal Structure »
Matt Jones · Tyler Scott · Gamaleldin Elsayed · Mengye Ren · Katherine Hermann · David Mayo · Michael Mozer