Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs
Tianhao Wu · Matthew Zurek · Yudong Chen · Weina Wang · Qiaomin Xie
Abstract
We study learning in average-reward weakly coupled Markov decision processes (WCMDPs) with heterogeneous arms. Naive approaches suffer exponential computation and sample complexity in the number of subsystems. We study a plug-in approach built on an efficient planning algorithm, which attains the first finite-sample (PAC) optimality-gap guarantees with polynomial sample complexity. This result is established under a new framework built on a Lyapunov analysis of a reference policy combined with a Lyapunov drift transfer technique, which can be viewed as a generalization of the classical simulation lemma.
Chat is not available.
Successful Page Load