Timezone: »
Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling. However, these methods maintain second-order statistics for each parameter, thus introducing significant memory overheads that restrict the size of the model being used as well as the number of examples in a mini-batch. We describe an effective and flexible adaptive optimization method with greatly reduced memory overhead. Our method retains the benefits of per-parameter adaptivity while allowing significantly larger models and batch sizes. We give convergence guarantees for our method, and demonstrate its effectiveness in training very large translation and language models with up to 2-fold speedups compared to the state-of-the-art.
Author Information
Rohan Anil (Google)
Vineet Gupta (Google)
Tomer Koren (Google)
Yoram Singer (Google)
More from the Same Authors
-
2021 Spotlight: Efficiently Identifying Task Groupings for Multi-Task Learning »
Chris Fifty · Ehsan Amid · Zhe Zhao · Tianhe Yu · Rohan Anil · Chelsea Finn -
2022 : Fishy: Layerwise Fisher Approximation for Higher-order Neural Network Optimization »
Abel Peirson · Ehsan Amid · Yatong Chen · Vladimir Feinberg · Manfred Warmuth · Rohan Anil -
2021 Poster: Algorithmic Instabilities of Accelerated Gradient Descent »
Amit Attia · Tomer Koren -
2021 Oral: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: Efficiently Identifying Task Groupings for Multi-Task Learning »
Chris Fifty · Ehsan Amid · Zhe Zhao · Tianhe Yu · Rohan Anil · Chelsea Finn -
2021 Poster: Towards Best-of-All-Worlds Online Learning with Feedback Graphs »
Liad Erez · Tomer Koren -
2021 Poster: Never Go Full Batch (in Stochastic Convex Optimization) »
Idan Amir · Yair Carmon · Tomer Koren · Roi Livni -
2021 Poster: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: Asynchronous Stochastic Optimization Robust to Arbitrary Delays »
Alon Cohen · Amit Daniely · Yoel Drori · Tomer Koren · Mariano Schain -
2020 Poster: Stochastic Optimization with Laggard Data Pipelines »
Naman Agarwal · Rohan Anil · Tomer Koren · Kunal Talwar · Cyril Zhang -
2019 : Private Stochastic Convex Optimization: Optimal Rates in Linear Time »
Vitaly Feldman · Tomer Koren · Kunal Talwar -
2019 : Spotlight talks »
Damien Scieur · Konstantin Mishchenko · Rohan Anil -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Robust Bi-Tempered Logistic Loss Based on Bregman Divergences »
Ehsan Amid · Manfred K. Warmuth · Rohan Anil · Tomer Koren -
2017 Poster: Affine-Invariant Online Optimization and the Low-rank Experts Problem »
Tomer Koren · Roi Livni -
2017 Poster: Multi-Armed Bandits with Metric Movement Costs »
Tomer Koren · Roi Livni · Yishay Mansour