Timezone: »
The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of ``fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.
Author Information
Michael Zhang (University of Toronto / Vector Institute)
PhD student at the University of Toronto
James Lucas (University of Toronto)
Jimmy Ba (University of Toronto / Vector Institute)
Geoffrey E Hinton (Google & University of Toronto)
Geoffrey Hinton received his PhD in Artificial Intelligence from Edinburgh in 1978 and spent five years as a faculty member at Carnegie-Mellon where he pioneered back-propagation, Boltzmann machines and distributed representations of words. In 1987 he became a fellow of the Canadian Institute for Advanced Research and moved to the University of Toronto. In 1998 he founded the Gatsby Computational Neuroscience Unit at University College London, returning to the University of Toronto in 2001. His group at the University of Toronto then used deep learning to change the way speech recognition and object recognition are done. He currently splits his time between the University of Toronto and Google. In 2010 he received the NSERC Herzberg Gold Medal, Canada's top award in Science and Engineering.
More from the Same Authors
-
2021 Spotlight: Neural Additive Models: Interpretable Machine Learning with Neural Nets »
Rishabh Agarwal · Levi Melnick · Nicholas Frosst · Xuezhou Zhang · Ben Lengerich · Rich Caruana · Geoffrey Hinton -
2021 : BLAST: Latent Dynamics Models from Bootstrapping »
Keiran Paster · Lev McKinney · Sheila McIlraith · Jimmy Ba -
2022 Poster: Optimizing Data Collection for Machine Learning »
Rafid Mahmood · James Lucas · Jose M. Alvarez · Sanja Fidler · Marc Law -
2022 : Large Language Models Are Human-Level Prompt Engineers »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Silviu Pitis · Harris Chan · Keiran Paster · Jimmy Ba -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Temporary Goals for Exploration »
Haoyang Xu · Jimmy Ba · Silviu Pitis · Harris Chan -
2022 : Return Augmentation gives Supervised RL Temporal Compositionality »
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba -
2022 : Guiding Exploration Towards Impactful Actions »
Vaibhav Saxena · Jimmy Ba · Danijar Hafner -
2022 : Steering Large Language Models using APE »
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Keiran Paster · Silviu Pitis · Harris Chan · Jimmy Ba -
2022 : Rational Multi-Objective Agents Must Admit Non-Markov Reward Representations »
Silviu Pitis · Duncan Bailey · Jimmy Ba -
2023 Poster: Laying the Foundation for an Instruction-Following Generalist Agent in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 Poster: Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu -
2023 Poster: AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback »
Yann Dubois · Xuechen Li · Rohan Taori · Tianyi Zhang · Ishaan Gulrajani · Jimmy Ba · Carlos Guestrin · Percy Liang · Tatsunori Hashimoto -
2022 : Invited Talk by Jimmy Ba »
Jimmy Ba -
2022 Invited Talk: The Forward-Forward Algorithm for Training Deep Neural Networks »
Geoffrey Hinton -
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: You Can’t Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2022 Poster: A Unified Sequence Interface for Vision Tasks »
Ting Chen · Saurabh Saxena · Lala Li · Tsung-Yi Lin · David Fleet · Geoffrey Hinton -
2022 Poster: Dataset Distillation using Neural Feature Regression »
Yongchao Zhou · Ehsan Nezhadarya · Jimmy Ba -
2021 Poster: Clockwork Variational Autoencoders »
Vaibhav Saxena · Jimmy Ba · Danijar Hafner -
2021 Poster: Learning Domain Invariant Representations in Goal-conditioned Block MDPs »
Beining Han · Chongyi Zheng · Harris Chan · Keiran Paster · Michael Zhang · Jimmy Ba -
2021 Poster: Canonical Capsules: Self-Supervised Capsules in Canonical Pose »
Weiwei Sun · Andrea Tagliasacchi · Boyang Deng · Sara Sabour · Soroosh Yazdani · Geoffrey Hinton · Kwang Moo Yi -
2021 Poster: How does a Neural Network's Architecture Impact its Robustness to Noisy Labels? »
Jingling Li · Mozhi Zhang · Keyulu Xu · John Dickerson · Jimmy Ba -
2021 Poster: Neural Additive Models: Interpretable Machine Learning with Neural Nets »
Rishabh Agarwal · Levi Melnick · Nicholas Frosst · Xuezhou Zhang · Ben Lengerich · Rich Caruana · Geoffrey Hinton -
2020 : Contributed Talk #2: Evaluating Agents Without Rewards »
Brendon Matusch · Danijar Hafner · Jimmy Ba -
2020 : Poster Session 3 (gather.town) »
Denny Wu · Chengrun Yang · Tolga Ergen · sanae lotfi · Charles Guille-Escuret · Boris Ginsburg · Hanbake Lyu · Cong Xie · David Newton · Debraj Basu · Yewen Wang · James Lucas · MAOJIA LI · Lijun Ding · Jose Javier Gonzalez Ortiz · Reyhane Askari Hemmat · Zhiqi Bu · Neal Lawton · Kiran Thekumparampil · Jiaming Liang · Lindon Roberts · Jingyi Zhu · Dongruo Zhou -
2020 : Contributed Talk: Planning from Pixels using Inverse Dynamics Models »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2020 Session: Orals & Spotlights Track 34: Deep Learning »
Tuo Zhao · Jimmy Ba -
2020 Poster: Regularized linear autoencoders recover the principal components, eventually »
Xuchan Bao · James Lucas · Sushant Sachdeva · Roger Grosse -
2020 Poster: Big Self-Supervised Models are Strong Semi-Supervised Learners »
Ting Chen · Simon Kornblith · Kevin Swersky · Mohammad Norouzi · Geoffrey E Hinton -
2019 : James Lucas, "Information-theoretic limitations on novel task generalization" »
James Lucas -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · Gaël Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Stacked Capsule Autoencoders »
Adam Kosiorek · Sara Sabour · Yee Whye Teh · Geoffrey E Hinton -
2019 Poster: Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks »
Qiyang Li · Saminul Haque · Cem Anil · James Lucas · Roger Grosse · Joern-Henrik Jacobsen -
2019 Poster: When does label smoothing help? »
Rafael Müller · Simon Kornblith · Geoffrey E Hinton -
2019 Spotlight: When does label smoothing help? »
Rafael Müller · Simon Kornblith · Geoffrey E Hinton -
2019 Poster: Graph Normalizing Flows »
Jenny Liu · Aviral Kumar · Jimmy Ba · Jamie Kiros · Kevin Swersky -
2019 Poster: Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse »
James Lucas · George Tucker · Roger Grosse · Mohammad Norouzi -
2018 Poster: Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures »
Sergey Bartunov · Adam Santoro · Blake Richards · Luke Marris · Geoffrey E Hinton · Timothy Lillicrap -
2018 Poster: Reversible Recurrent Neural Networks »
Matthew MacKay · Paul Vicol · Jimmy Ba · Roger Grosse -
2017 Poster: Dynamic Routing Between Capsules »
Sara Sabour · Nicholas Frosst · Geoffrey E Hinton -
2017 Spotlight: Dynamic Routing Between Capsules »
Sara Sabour · Nicholas Frosst · Geoffrey E Hinton -
2016 Poster: Attend, Infer, Repeat: Fast Scene Understanding with Generative Models »
S. M. Ali Eslami · Nicolas Heess · Theophane Weber · Yuval Tassa · David Szepesvari · koray kavukcuoglu · Geoffrey E Hinton -
2016 Poster: Using Fast Weights to Attend to the Recent Past »
Jimmy Ba · Geoffrey E Hinton · Volodymyr Mnih · Joel Leibo · Catalin Ionescu -
2016 Oral: Using Fast Weights to Attend to the Recent Past »
Jimmy Ba · Geoffrey E Hinton · Volodymyr Mnih · Joel Leibo · Catalin Ionescu -
2015 Poster: Grammar as a Foreign Language »
Oriol Vinyals · Łukasz Kaiser · Terry Koo · Slav Petrov · Ilya Sutskever · Geoffrey Hinton -
2015 Tutorial: Deep Learning »
Geoffrey E Hinton · Yoshua Bengio · Yann LeCun -
2014 Workshop: Deep Learning and Representation Learning »
Andrew Y Ng · Yoshua Bengio · Adam Coates · Roland Memisevic · Sharanyan Chetlur · Geoffrey E Hinton · Shamim Nemati · Bryan Catanzaro · Surya Ganguli · Herbert Jaeger · Phil Blunsom · Leon Bottou · Volodymyr Mnih · Chen-Yu Lee · Rich M Schwartz -
2012 Poster: ImageNet Classification with Deep Convolutional Neural Networks »
Alex Krizhevsky · Ilya Sutskever · Geoffrey E Hinton -
2012 Invited Talk: Dropout: A simple and effective way to improve neural networks »
Geoffrey E Hinton · George Dahl -
2012 Poster: A Better Way to Pre-Train Deep Boltzmann Machines »
Russ Salakhutdinov · Geoffrey E Hinton -
2012 Spotlight: ImageNet Classification with Deep Convolutional Neural Networks »
Alex Krizhevsky · Ilya Sutskever · Geoffrey E Hinton -
2010 Workshop: Deep Learning and Unsupervised Feature Learning »
Honglak Lee · Marc'Aurelio Ranzato · Yoshua Bengio · Geoffrey E Hinton · Yann LeCun · Andrew Y Ng -
2010 Talk: A Probabilistic Approach to Data Visualization »
Geoffrey E Hinton -
2010 Oral: Learning to combine foveal glimpses with a third-order Boltzmann machine »
Hugo Larochelle · Geoffrey E Hinton -
2010 Poster: Learning to combine foveal glimpses with a third-order Boltzmann machine »
Hugo Larochelle · Geoffrey E Hinton -
2010 Poster: Generating more realistic images using gated MRF's »
Marc'Aurelio Ranzato · Volodymyr Mnih · Geoffrey E Hinton -
2010 Poster: Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine »
George Dahl · Marc'Aurelio Ranzato · Abdel-rahman Mohamed · Geoffrey E Hinton -
2010 Poster: Gated Softmax Classification »
Roland Memisevic · Christopher Zach · Geoffrey E Hinton · Marc Pollefeys -
2009 Workshop: Deep Learning for Speech Recognition and Related Applications »
Li Deng · Dong Yu · Geoffrey E Hinton -
2009 Poster: Replicated Softmax: an Undirected Topic Model »
Russ Salakhutdinov · Geoffrey E Hinton -
2009 Poster: 3D Object Recognition with Deep Belief Nets »
Vinod Nair · Geoffrey E Hinton -
2009 Spotlight: 3D Object Recognition with Deep Belief Nets »
Vinod Nair · Geoffrey E Hinton -
2009 Invited Talk: Deep Learning with Multiplicative Interactions »
Geoffrey E Hinton -
2009 Poster: Zero-shot Learning with Semantic Output Codes »
Mark M Palatucci · Dean Pomerleau · Geoffrey E Hinton · Tom Mitchell -
2008 Poster: Using matrices to model symbolic relationship »
Ilya Sutskever · Geoffrey E Hinton -
2008 Demonstration: Visualizing NIPS Cooperations using Multiple Maps t-SNE »
Laurens van der Maaten · Geoffrey E Hinton -
2008 Spotlight: Using matrices to model symbolic relationship »
Ilya Sutskever · Geoffrey E Hinton -
2008 Poster: The Recurrent Temporal Restricted Boltzmann Machine »
Ilya Sutskever · Geoffrey E Hinton · Graham Taylor -
2008 Poster: A Scalable Hierarchical Distributed Language Model »
Andriy Mnih · Geoffrey E Hinton -
2008 Poster: Implicit Mixtures of Restricted Boltzmann Machines »
Vinod Nair · Geoffrey E Hinton -
2008 Poster: Competing RBM density models for classification of fMRI images »
Tanya Schmah · Geoffrey E Hinton · Richard Zemel -
2007 Tutorial: Deep Belief Nets »
Geoffrey E Hinton -
2007 Poster: Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes »
Russ Salakhutdinov · Geoffrey E Hinton -
2007 Poster: Modeling image patches with a directed hierarchy of Markov random fields »
Simon Osindero · Geoffrey E Hinton -
2006 Poster: Modeling Human Motion Using Binary Latent Variables »
Graham Taylor · Geoffrey E Hinton · Sam T Roweis -
2006 Spotlight: Modeling Human Motion Using Binary Latent Variables »
Graham Taylor · Geoffrey E Hinton · Sam T Roweis