Timezone: »
Poster
Finding Correlated Equilibrium of Constrained Markov Game: A Primal-Dual Approach
Ziyi Chen · Shaocong Ma · Yi Zhou
Constrained Markov game is a fundamental problem that covers many applications, where multiple players compete with each other under behavioral constraints. The existing literature has proved the existence of Nash equilibrium for constrained Markov games, which turns out to be PPAD-complete and cannot be computed in polynomial time. In this work, we propose a surrogate notion of correlated equilibrium (CE) for constrained Markov games that can be computed in polynomial time, and study its fundamental properties. We show that the modification structure of CE of constrained Markov games is fundamentally different from that of unconstrained Markov games. Moreover, we prove that the corresponding Lagrangian function has zero duality gap. Based on this result, we develop the first primal-dual algorithm that provably converges to CE of constrained Markov games. In particular, we prove that both the duality gap and the constraint violation of the output policy converge at the rate $\mathcal{O}(\frac{1}{\sqrt{T}})$. Moreover, when adopting the V-learning algorithm as the subroutine in the primal update, our algorithm achieves an approximate CE with $\epsilon$ duality gap with the sample complexity $\mathcal{O}(H^9|\mathcal{S}||\mathcal{A}|^{2} \epsilon^{-4})$.
Author Information
Ziyi Chen (University of Utah)
Shaocong Ma (University of Utah)
Yi Zhou (University of Utah)
More from the Same Authors
-
2021 Poster: Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation »
Yue Wang · Shaofeng Zou · Yi Zhou -
2020 Poster: A Statistical Mechanics Framework for Task-Agnostic Sample Design in Machine Learning »
Bhavya Kailkhura · Jayaraman Thiagarajan · Qunwei Li · Jize Zhang · Yi Zhou · Timo Bremer -
2020 Poster: Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis »
Shaocong Ma · Yi Zhou · Shaofeng Zou -
2019 Poster: SpiderBoost and Momentum: Faster Variance Reduction Algorithms »
Zhe Wang · Kaiyi Ji · Yi Zhou · Yingbin Liang · Vahid Tarokh