Timezone: »

Neural Network Online Training with Sensitivity to Multiscale Temporal Structure
Matt Jones · Tyler Scott · Gamaleldin Elsayed · Mengye Ren · Katherine Hermann · David Mayo · Michael Mozer
Event URL: https://openreview.net/forum?id=j78KgxoldSw »
Many online-learning domains in artificial intelligence involve data with nonstationarities spanning a wide range of timescales. Heuristic approaches to nonstationarity include retraining models frequently with only the freshest data and using iterative gradient-based updating methods that implicitly discount older data. We propose an alternative approach based on Bayesian inference over $1/f$ noise. The method is cast as a Kalman filter that posits latent variables with various characteristic timescales and maintains a joint posterior over them. We also derive a variational approximation that tracks these variables independently. The variational method can be implemented as a drop-in optimizer for any neural network architecture, which works by decomposing each weight as a sum of subweights with different decay rates. We test these methods on two synthetic, online-learning tasks with environmental parameters varying across time according to $1/f$ noise. Baseline methods based on finite memory show a nonmonotonic relationship between memory horizon and performance, a signature of data going ``stale.'' The Bayesian and variational methods perform significantly better by leveraging all past data and performing appropriate inference at all timescales.

Author Information

Matt Jones (Google Brain)
Tyler Scott (University of Colorado, Boulder)
Gamaleldin Elsayed (Google Research, Brain Team)
Mengye Ren (NYU)
Katherine Hermann (Google)
David Mayo (MIT)
Michael Mozer (Google Research, Brain Team)

More from the Same Authors