Poster
Generalized Weighted Path Consistency for Mastering Atari Games
Dengwei Zhao · Shikui Tu · Lei Xu
Great Hall & Hall B1+B2 (level 1) #1426
Abstract:
Reinforcement learning with the help of neural-guided search consumes huge computational resources to achieve remarkable performance. Path consistency (PC), i.e., values on one optimal path should be identical, was previously imposed on MCTS by PCZero to improve the learning efficiency of AlphaZero. Not only PCZero still lacks a theoretical support but also considers merely board games. In this paper, PCZero is generalized into GW-PCZero for real applications with non-zero immediate reward. A weighting mechanism is introduced to reduce the variance caused by scouting's uncertainty on the value estimation. For the first time, it is theoretically proved that neural-guided MCTS is guaranteed to find the optimal solution under the constraint of PC. Experiments are conducted on the Atari k benchmark with games and GW-PCZero achieves mean human performance, higher than the state-of-the-art EfficientZero's , while consuming only of the computational resources consumed by EfficientZero.
Chat is not available.