Poster
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning
Ahmadreza Moradipari · Mohammad Pedramfar · Modjtaba Shokrian Zini · Vaneet Aggarwal
Great Hall & Hall B1+B2 (level 1) #1827
Abstract:
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We present a refined analysis of the information ratio, and show an upper bound of order in the time inhomogeneous reinforcement learning problem where is the episode length and is the Kolmogorov dimension of the space of environments. We then find concrete bounds of in a variety of settings, such as tabular, linear and finite mixtures, and discuss how our results improve the state-of-the-art.
Chat is not available.