Poster
Dynamic Regret of Adversarial Linear Mixture MDPs
Long-Fei Li · Peng Zhao · Zhi-Hua Zhou
Great Hall & Hall B1+B2 (level 1) #1417
Abstract:
We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-information rewards and the unknown transition kernel. We consider the linear mixture MDPs whose transition kernel is a linear mixture model and choose the \emph{dynamic regret} as the performance measure. Denote by the dimension of the feature mapping, the horizon, the number of episodes, the non-stationary measure, we propose a novel algorithm that enjoys an dynamic regret under the condition that is known, which improves previously best-known dynamic regret for adversarial linear mixture MDP and adversarial tabular MDPs. We also establish an lower bound, indicating our algorithm is \emph{optimal} in and . Furthermore, when the non-stationary measure is unknown, we design an online ensemble algorithm with a meta-base structure, which is proved to achieve an dynamic regret and here is the expected switching number of the best base-learner. The result can be optimal under certain regimes.
Chat is not available.