Timezone: »

Lookahead Optimizer: k steps forward, 1 step back
Michael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton

Thu Dec 05:00 PM -- 07:00 PM PST @ East Exhibition Hall B + C #200

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of ``fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.

Author Information

Michael Zhang (University of Toronto / Vector Institute)

Undergrad at UC Berkeley, currently pursuing a PhD at the University of Toronto

James Lucas (University of Toronto)
Jimmy Ba (University of Toronto / Vector Institute)
Geoffrey E Hinton (Google & University of Toronto)

Geoffrey Hinton received his PhD in Artificial Intelligence from Edinburgh in 1978 and spent five years as a faculty member at Carnegie-Mellon where he pioneered back-propagation, Boltzmann machines and distributed representations of words. In 1987 he became a fellow of the Canadian Institute for Advanced Research and moved to the University of Toronto. In 1998 he founded the Gatsby Computational Neuroscience Unit at University College London, returning to the University of Toronto in 2001. His group at the University of Toronto then used deep learning to change the way speech recognition and object recognition are done. He currently splits his time between the University of Toronto and Google. In 2010 he received the NSERC Herzberg Gold Medal, Canada's top award in Science and Engineering.

More from the Same Authors