NeurIPS Poster Generalized Weighted Path Consistency for Mastering Atari Games

Poster

Generalized Weighted Path Consistency for Mastering Atari Games

Dengwei Zhao · Shikui Tu · Lei Xu

Great Hall & Hall B1+B2 (level 1) #1426

[ Abstract ] [ Project Page ]

[ Paper] [ Poster] [ OpenReview]

Abstract: Reinforcement learning with the help of neural-guided search consumes huge computational resources to achieve remarkable performance. Path consistency (PC), i.e.,

f

$f$ values on one optimal path should be identical, was previously imposed on MCTS by PCZero to improve the learning efficiency of AlphaZero. Not only PCZero still lacks a theoretical support but also considers merely board games. In this paper, PCZero is generalized into GW-PCZero for real applications with non-zero immediate reward. A weighting mechanism is introduced to reduce the variance caused by scouting's uncertainty on the

f

$f$ value estimation. For the first time, it is theoretically proved that neural-guided MCTS is guaranteed to find the optimal solution under the constraint of PC. Experiments are conducted on the Atari

100

$100$ k benchmark with

26

$26$ games and GW-PCZero achieves

198 %

$198\%$ mean human performance, higher than the state-of-the-art EfficientZero's

194

$194\\%$ , while consuming only

25

$25\\%$ of the computational resources consumed by EfficientZero.

Chat is not available.