Skip to yearly menu bar Skip to main content


Poster

Dynamic Regret of Adversarial Linear Mixture MDPs

Long-Fei Li · Peng Zhao · Zhi-Hua Zhou

Great Hall & Hall B1+B2 (level 1) #1417

Abstract: We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-information rewards and the unknown transition kernel. We consider the linear mixture MDPs whose transition kernel is a linear mixture model and choose the \emph{dynamic regret} as the performance measure. Denote by d the dimension of the feature mapping, H the horizon, K the number of episodes, PT the non-stationary measure, we propose a novel algorithm that enjoys an O~(d2H3K+H4(K+PT)(1+PT)) dynamic regret under the condition that PT is known, which improves previously best-known dynamic regret for adversarial linear mixture MDP and adversarial tabular MDPs. We also establish an Ω(d2H3K+HK(H+PT)) lower bound, indicating our algorithm is \emph{optimal} in K and PT. Furthermore, when the non-stationary measure PT is unknown, we design an online ensemble algorithm with a meta-base structure, which is proved to achieve an O~(d2H3K+H4(K+PT)(1+PT)+H2ST2) dynamic regret and here ST is the expected switching number of the best base-learner. The result can be optimal under certain regimes.

Chat is not available.