Workshop: Memory in Artificial and Real Intelligence (MemARI)

Neural Network Online Training with Sensitivity to Multiscale Temporal Structure

Matt Jones · Tyler Scott · Gamaleldin Elsayed · Mengye Ren · Katherine Hermann · David Mayo · Michael Mozer

Keywords: [ extended Kalman filter ] [ Bayesian models ] [ Online Learning ] [ optimizers ] [ 1/f noise ] [ Distribution Shift ]

Abstract: Many online-learning domains in artificial intelligence involve data with nonstationarities spanning a wide range of timescales. Heuristic approaches to nonstationarity include retraining models frequently with only the freshest data and using iterative gradient-based updating methods that implicitly discount older data. We propose an alternative approach based on Bayesian inference over $1/f$ noise. The method is cast as a Kalman filter that posits latent variables with various characteristic timescales and maintains a joint posterior over them. We also derive a variational approximation that tracks these variables independently. The variational method can be implemented as a drop-in optimizer for any neural network architecture, which works by decomposing each weight as a sum of subweights with different decay rates. We test these methods on two synthetic, online-learning tasks with environmental parameters varying across time according to $1/f$ noise. Baseline methods based on finite memory show a nonmonotonic relationship between memory horizon and performance, a signature of data going ``stale.'' The Bayesian and variational methods perform significantly better by leveraging all past data and performing appropriate inference at all timescales.

Chat is not available.