Aligning Distributionally Robust Optimization with Practical Deep Learning Needs
Dmitrii Feoktistov · Igor Ignashin · Andrey Veprikov · Nikita Borovko · Aleksandr Bogdanov · Savelii Chezhegov · Aleksandr Beznosikov
Abstract
While traditional Deep Learning (DL) optimization methods treat all training samples equally, Distributionally Robust Optimization (DRO) adaptively assigns importance weights to different samples. However, a significant gap exists between DRO and current DL practices. Modern DL optimizers require adaptivity and the ability to handle stochastic gradients, as these methods demonstrate superior performance. This paper aims to bridge this gap by introducing ALSO -- Adaptive Loss Scaling Optimizer -- an adaptive DRO algorithm suitable for DL. We prove the convergence of our proposed algorithm for non-convex objectives, the standard setting for DL models. Empirical evaluation demonstrates that ALSO outperforms baselines.
Chat is not available.
Successful Page Load