Timezone: »
Neural Network Online Training with Sensitivity to Multiscale Temporal Structure
Matt Jones · Tyler Scott · Gamaleldin Elsayed · Mengye Ren · Katherine Hermann · David Mayo · Michael Mozer
Event URL: https://openreview.net/forum?id=j78KgxoldSw »
Many online-learning domains in artificial intelligence involve data with nonstationarities spanning a wide range of timescales. Heuristic approaches to nonstationarity include retraining models frequently with only the freshest data and using iterative gradient-based updating methods that implicitly discount older data. We propose an alternative approach based on Bayesian inference over $1/f$ noise. The method is cast as a Kalman filter that posits latent variables with various characteristic timescales and maintains a joint posterior over them. We also derive a variational approximation that tracks these variables independently. The variational method can be implemented as a drop-in optimizer for any neural network architecture, which works by decomposing each weight as a sum of subweights with different decay rates. We test these methods on two synthetic, online-learning tasks with environmental parameters varying across time according to $1/f$ noise. Baseline methods based on finite memory show a nonmonotonic relationship between memory horizon and performance, a signature of data going ``stale.'' The Bayesian and variational methods perform significantly better by leveraging all past data and performing appropriate inference at all timescales.
Many online-learning domains in artificial intelligence involve data with nonstationarities spanning a wide range of timescales. Heuristic approaches to nonstationarity include retraining models frequently with only the freshest data and using iterative gradient-based updating methods that implicitly discount older data. We propose an alternative approach based on Bayesian inference over $1/f$ noise. The method is cast as a Kalman filter that posits latent variables with various characteristic timescales and maintains a joint posterior over them. We also derive a variational approximation that tracks these variables independently. The variational method can be implemented as a drop-in optimizer for any neural network architecture, which works by decomposing each weight as a sum of subweights with different decay rates. We test these methods on two synthetic, online-learning tasks with environmental parameters varying across time according to $1/f$ noise. Baseline methods based on finite memory show a nonmonotonic relationship between memory horizon and performance, a signature of data going ``stale.'' The Bayesian and variational methods perform significantly better by leveraging all past data and performing appropriate inference at all timescales.
Author Information
Matt Jones (Google Brain)
Tyler Scott (University of Colorado, Boulder)
Gamaleldin Elsayed (Google Research, Brain Team)
Mengye Ren (NYU)
Katherine Hermann (Google)
David Mayo (MIT)
Michael Mozer (Google Research, Brain Team)
More from the Same Authors
-
2021 : Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning »
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer -
2021 : Learning Neural Causal Models with Active Interventions »
Nino Scherrer · Olexa Bilaniuk · Yashas Annadani · Anirudh Goyal · Patrick Schwab · Bernhard Schölkopf · Michael Mozer · Yoshua Bengio · Stefan Bauer · Nan Rosemary Ke -
2022 : Learning at Multiple Timescales »
Matt Jones -
2022 : Learning to Reason With Relational Abstractions »
Andrew Nam · James McClelland · Mengye Ren · Chelsea Finn -
2022 : Spatial Symmetry in Slot Attention »
Ondrej Biza · Sjoerd van Steenkiste · Mehdi S. M. Sajjadi · Gamaleldin Elsayed · Aravindh Mahendran · Thomas Kipf -
2022 : Workshop version: How hard are computer vision datasets? Calibrating dataset difficulty to viewing time »
David Mayo · Jesse Cummings · Xinyu Lin · Dan Gutfreund · Boris Katz · Andrei Barbu -
2022 : Teacher-generated pseudo human spatial-attention labels boost contrastive learning models »
Yushi Yao · Chang Ye · Junfeng He · Gamaleldin Elsayed -
2022 : An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better? »
Tyler Scott · Ting Liu · Michael Mozer · Andrew Gallagher -
2022 : Image recognition time for humans predicts adversarial vulnerability for models »
David Mayo · Jesse Cummings · Xinyu Lin · Boris Katz · Andrei Barbu -
2022 Poster: SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos »
Gamaleldin Elsayed · Aravindh Mahendran · Sjoerd van Steenkiste · Klaus Greff · Michael Mozer · Thomas Kipf -
2021 : On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation »
Binxu Wang · David Mayo · Arturo Deza · Andrei Barbu · Colin Conwell -
2021 Poster: Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss »
Michael Iuzzolino · Michael Mozer · Samy Bengio -
2021 Poster: Neural Regression, Representational Similarity, Model Zoology & Neural Taskonomy at Scale in Rodent Visual Cortex »
Colin Conwell · David Mayo · Andrei Barbu · Michael Buice · George Alvarez · Boris Katz -
2021 Poster: Soft Calibration Objectives for Neural Networks »
Archit Karandikar · Nicholas Cain · Dustin Tran · Balaji Lakshminarayanan · Jonathon Shlens · Michael Mozer · Becca Roelofs -
2021 Poster: Neural Production Systems »
Anirudh Goyal · Aniket Didolkar · Nan Rosemary Ke · Charles Blundell · Philippe Beaudoin · Nicolas Heess · Michael Mozer · Yoshua Bengio -
2021 Poster: Discrete-Valued Neural Communication »
Dianbo Liu · Alex Lamb · Kenji Kawaguchi · Anirudh Goyal · Chen Sun · Michael Mozer · Yoshua Bengio -
2019 Poster: ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models »
Andrei Barbu · David Mayo · Julian Alverio · William Luo · Christopher Wang · Dan Gutfreund · Josh Tenenbaum · Boris Katz -
2019 Poster: Saccader: Improving Accuracy of Hard Attention Models for Vision »
Gamaleldin Elsayed · Simon Kornblith · Quoc V Le -
2018 Poster: Large Margin Deep Networks for Classification »
Gamaleldin Elsayed · Dilip Krishnan · Hossein Mobahi · Kevin Regan · Samy Bengio -
2018 Poster: Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning »
Tyler Scott · Karl Ridgeway · Michael Mozer -
2018 Spotlight: Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning »
Tyler Scott · Karl Ridgeway · Michael Mozer -
2018 Poster: Adversarial Examples that Fool both Computer Vision and Time-Limited Humans »
Gamaleldin Elsayed · Shreya Shankar · Brian Cheung · Nicolas Papernot · Alexey Kurakin · Ian Goodfellow · Jascha Sohl-Dickstein