Skip to yearly menu bar Skip to main content


Poster

Solving Zero-Sum Markov Games with Continous State via Spectral Dynamic Embedding

Chenhao Zhou · Zebang Shen · zhang chao · Hanbin Zhao · Hui Qian

[ ]
Fri 13 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: In this paper, we propose a provably efficient natural policy gradient algorithm called Spectral Dynamic Embedding Policy Optimization (\SDEPO) for two-player zero-sum stochastic Markov games with continuous state space and finite action space. In the policy evaluation procedure of our algorithm, a novel kernel embedding method is employed to construct a finite-dimensional linear approximations to the state-action value function. We explicitly analyze the approximation error in policy evaluation, and show that \SDEPO\ achieves an $\tilde{O}(\frac{1}{(1-\gamma)^3\epsilon})$ last-iterate convergence to the $\epsilon-$optimal Nash equilibrium, which is independent of the cardinality of the state space. The complexity result matches the best-known results for global convergence of policy gradient algorithms for single agent setting. Moreover, we also propose a practical variant of \SDEPO\ to deal with continuous action space and empirical results demonstrate the practical superiority of the proposed method.

Live content is unavailable. Log in and register to view live content