Timezone: »

Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization
Ke Sun · Yafei Wang · Yi Liu · yingnan zhao · Bo Pan · Shangling Jui · Bei Jiang · Linglong Kong

Fri Dec 10 08:30 AM -- 10:00 AM (PST) @

Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.

Author Information

Ke Sun (University of Alberta)
Yafei Wang (University of Alberta)
Yi Liu (University of Alaberta)
yingnan zhao (Harbin Institute of Technology)
Bo Pan (University of Alberta)
Shangling Jui (Huawei)

Dr. Jui is the chief AI scientist of Huawei Kirin team. His knowledge on AI and reinforcement learning has guided the team to build the eco-system of Kirin platform. He support decisions and investment of AI to Canadian universities including UBC, SFU, UofToronto, UofAlberta, UofWaterloo, etc., through joint lab collaborations and local Huawei offices.

Bei Jiang (University of Alberta)
Linglong Kong (University of Alberta)

More from the Same Authors