Timezone: »
Poster
Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Juntang Zhuang · Yifan Ding · Tommy Tang · Nicha Dvornek · Sekhar C Tatikonda · James Duncan
We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer which combines centering of second momentum and asynchronous update (e.g. for $t$-th update, denominator uses information up to step $t-1$, while numerator uses gradient at $t$-th step). ACProp has both strong theoretical properties and empirical performance. With the example by Reddi et al. (2018), we show that asynchronous optimizers (e.g. AdaShift, ACProp) have weaker convergence condition than synchronous optimizers (e.g. Adam, RMSProp, AdaBelief); within asynchronous optimizers, we show that centering of second momentum further weakens the convergence condition. We demonstrate that ACProp has a convergence rate of $O(\frac{1}{\sqrt{T}})$ for the stochastic non-convex case, which matches the oracle rate and outperforms the $O(\frac{logT}{\sqrt{T}})$ rate of RMSProp and Adam. We validate ACProp in extensive empirical studies: ACProp outperforms both SGD and other adaptive optimizers in image classification with CNN, and outperforms well-tuned adaptive optimizers in the training of various GAN models, reinforcement learning and transformers. To sum up, ACProp has good theoretical properties including weak convergence condition and optimal convergence rate, and strong empirical performance including good generalization like SGD and training stability like Adam. We provide the implementation at \url{ https://github.com/juntang-zhuang/ACProp-Optimizer}.
Author Information
Juntang Zhuang (Yale University)
Yifan Ding (University of Central Florida)
Tommy Tang (University of Illinois Urbana-Champaign)
Nicha Dvornek (Yale University)
Sekhar C Tatikonda (Yale University)
James Duncan (Yale University)
More from the Same Authors
-
2022 : Session 2 Keynote 2 »
James Duncan -
2022 Poster: Class-Aware Adversarial Transformers for Medical Image Segmentation »
Chenyu You · Ruihan Zhao · Fenglin Liu · Siyuan Dong · Sandeep Chinchali · Ufuk Topcu · Lawrence Staib · James Duncan -
2020 Poster: AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients »
Juntang Zhuang · Tommy Tang · Yifan Ding · Sekhar C Tatikonda · Nicha Dvornek · Xenophon Papademetris · James Duncan -
2020 Spotlight: AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients »
Juntang Zhuang · Tommy Tang · Yifan Ding · Sekhar C Tatikonda · Nicha Dvornek · Xenophon Papademetris · James Duncan -
2017 Poster: Accelerated consensus via Min-Sum Splitting »
Patrick Rebeschini · Sekhar C Tatikonda -
2014 Poster: Testing Unfaithful Gaussian Graphical Models »
De Wen Soh · Sekhar C Tatikonda