[Re] AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Anirudh Buvanesh · Madhur Panwar

Keywords: [ ReScience - MLRC 2021 ] [ Journal Track ]


Optimization is of prime importance to machine learning and is an active area of research. A better optimization algorithm helps in achieving better optima faster. AdaBelief is an optimizer that claims to have 1) fast convergence as in adaptive methods, 2) good generalization as in SGD, and 3) training stability. This report contains the experiments to validate these claims and test the effectiveness of AdaBelief. We first perform the experiments from AdaBelief's paper which cover a variety of datasets spanning multiple domains including Image Classification, Language Modeling, Generative Modeling, and Reinforcement Learning. We perform several analyses targeted toward AdaBelief's claims and find that the convergence speed and training stability of AdaBelief is comparable to that of adaptive gradient optimizers. However, AdaBelief does not generalize as well as SGD. Nevertheless, it is a promising optimizer combining the best of both worlds ‐ accelerated and adaptive gradient methods.

Chat is not available.