Timezone: »
Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD. Analogous study of adaptive gradient methods, such as RMSprop and Adam, has been challenging because there were no rigorously proven SDE approximations for these methods. This paper derives the SDE approximations for RMSprop and Adam, giving theoretical guarantees of their correctness as well as experimental validation of their applicability to common large-scaling vision and language settings. A key practical result is the derivation of a square root scaling rule to adjust the optimization hyperparameters of RMSprop and Adam when changing batch size, and its empirical validation in deep learning settings.
Author Information
Sadhika Malladi (Princeton University)
Kaifeng Lyu (Princeton University)
Abhishek Panigrahi (Princeton University)
Sanjeev Arora (Princeton University)
More from the Same Authors
-
2022 : Why (and When) does Local SGD Generalize Better than SGD? »
Xinran Gu · Kaifeng Lyu · Longbo Huang · Sanjeev Arora -
2023 Workshop: Mathematics of Modern Machine Learning (M3L) »
Aditi Raghunathan · Alex Damian · Bingbin Liu · Christina Baek · Kaifeng Lyu · Surbhi Goel · Tengyu Ma · Zhiyuan Li -
2022 : Poster Session 2 »
Jinwuk Seok · Bo Liu · Ryotaro Mitsuboshi · David Martinez-Rubio · Weiqiang Zheng · Ilgee Hong · Chen Fan · Kazusato Oko · Bo Tang · Miao Cheng · Aaron Defazio · Tim G. J. Rudner · Gabriele Farina · Vishwak Srinivasan · Ruichen Jiang · Peng Wang · Jane Lee · Nathan Wycoff · Nikhil Ghosh · Yinbin Han · David Mueller · Liu Yang · Amrutha Varshini Ramesh · Siqi Zhang · Kaifeng Lyu · David Yunis · Kumar Kshitij Patel · Fangshuo Liao · Dmitrii Avdiukhin · Xiang Li · Sattar Vakili · Jiaxin Shi -
2022 Poster: New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound »
Arushi Gupta · Nikunj Saunshi · Dingli Yu · Kaifeng Lyu · Sanjeev Arora -
2022 Poster: Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent »
Zhiyuan Li · Tianhao Wang · Jason Lee · Sanjeev Arora -
2022 Poster: Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction »
Kaifeng Lyu · Zhiyuan Li · Sanjeev Arora -
2021 : Invited talk 2 »
Sanjeev Arora -
2021 Oral: Evaluating Gradient Inversion Attacks and Defenses in Federated Learning »
Yangsibo Huang · Samyak Gupta · Zhao Song · Kai Li · Sanjeev Arora -
2021 Poster: On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) »
Zhiyuan Li · Sadhika Malladi · Sanjeev Arora -
2021 Poster: Evaluating Gradient Inversion Attacks and Defenses in Federated Learning »
Yangsibo Huang · Samyak Gupta · Zhao Song · Kai Li · Sanjeev Arora -
2021 Poster: Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias »
Kaifeng Lyu · Zhiyuan Li · Runzhe Wang · Sanjeev Arora -
2021 Poster: Learning and Generalization in RNNs »
Abhishek Panigrahi · Navin Goyal -
2020 : Keynote speech: Sanjeev Arora (PGDL) »
Sanjeev Arora · Yiding Jiang -
2020 Poster: Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate »
Zhiyuan Li · Kaifeng Lyu · Sanjeev Arora -
2020 Poster: Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality »
Yi Zhang · Orestis Plevrakis · Simon Du · Xingguo Li · Zhao Song · Sanjeev Arora -
2019 : Poster session »
Sebastian Farquhar · Erik Daxberger · Andreas Look · Matt Benatan · Ruiyi Zhang · Marton Havasi · Fredrik Gustafsson · James A Brofos · Nabeel Seedat · Micha Livne · Ivan Ustyuzhaninov · Adam Cobb · Felix D McGregor · Patrick McClure · Tim R. Davidson · Gaurush Hiranandani · Sanjeev Arora · Masha Itkina · Didrik Nielsen · William Harvey · Matias Valdenegro-Toro · Stefano Peluchetti · Riccardo Moriconi · Tianyu Cui · Vaclav Smidl · Taylan Cemgil · Jack Fitzsimons · He Zhao · · mariana vargas vieyra · Apratim Bhattacharyya · Rahul Sharma · Geoffroy Dubourg-Felonneau · Jonathan Warrell · Slava Voloshynovskiy · Mihaela Rosca · Jiaming Song · Andrew Ross · Homa Fashandi · Ruiqi Gao · Hooshmand Shokri Razaghi · Joshua Chang · Zhenzhong Xiao · Vanessa Boehm · Giorgio Giannone · Ranganath Krishnan · Joe Davison · Arsenii Ashukha · Jeremiah Liu · Sicong (Sheldon) Huang · Evgenii Nikishin · Sunho Park · Nilesh Ahuja · Mahesh Subedar · · Artyom Gadetsky · Jhosimar Arias Figueroa · Tim G. J. Rudner · Waseem Aslam · Adrián Csiszárik · John Moberg · Ali Hebbal · Kathrin Grosse · Pekka Marttinen · Bang An · Hlynur Jónsson · Samuel Kessler · Abhishek Kumar · Mikhail Figurnov · Omesh Tickoo · Steindor Saemundsson · Ari Heljakka · Dániel Varga · Niklas Heim · Simone Rossi · Max Laves · Waseem Gharbieh · Nicholas Roberts · Luis Armando Pérez Rey · Matthew Willetts · Prithvijit Chakrabarty · Sumedh Ghaisas · Carl Shneider · Wray Buntine · Kamil Adamczewski · Xavier Gitiaux · Suwen Lin · Hao Fu · Gunnar Rätsch · Aidan Gomez · Erik Bodin · Dinh Phung · Lennart Svensson · Juliano Tusi Amaral Laganá Pinto · Milad Alizadeh · Jianzhun Du · Kevin Murphy · Beatrix Benkő · Shashaank Vattikuti · Jonathan Gordon · Christopher Kanan · Sontje Ihler · Darin Graham · Michael Teng · Louis Kirsch · Tomas Pevny · Taras Holotyak -
2019 Poster: Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets »
Rohith Kuditipudi · Xiang Wang · Holden Lee · Yi Zhang · Zhiyuan Li · Wei Hu · Rong Ge · Sanjeev Arora -
2019 Poster: Implicit Regularization in Deep Matrix Factorization »
Sanjeev Arora · Nadav Cohen · Wei Hu · Yuping Luo -
2019 Spotlight: Implicit Regularization in Deep Matrix Factorization »
Sanjeev Arora · Nadav Cohen · Wei Hu · Yuping Luo -
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2018 : Plenary Talk 1 »
Sanjeev Arora -
2017 Workshop: Deep Learning: Bridging Theory and Practice »
Sanjeev Arora · Maithra Raghu · Russ Salakhutdinov · Ludwig Schmidt · Oriol Vinyals -
2012 Poster: Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders »
Sanjeev Arora · Rong Ge · Ankur Moitra · Sushant Sachdeva