Skip to yearly menu bar Skip to main content


On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs

Yi Wan · Richard Sutton

Abstract

Video

Chat is not available.