Skip to yearly menu bar Skip to main content


Poster

Model-free Posterior Sampling via Learning Rate Randomization

Daniil Tiapkin · Denis Belomestny · Daniele Calandriello · Eric Moulines · Remi Munos · Alexey Naumov · Pierre Perrault · Michal Valko · Pierre Ménard

Great Hall & Hall B1+B2 (level 1) #1301

Abstract: In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieves a regret bound of order O~(H5SAT), where H is the planning horizon, S is the number of states, A is the number of actions, and T is the number of episodes. For a metric state-action space, RandQL enjoys a regret bound of order O~(H5/2T(dz+1)/(dz+2)), where dz denotes the zooming dimension. Notably, RandQL achieves optimistic exploration without using bonuses, relying instead on a novel idea of learning rate randomization. Our empirical study shows that RandQL outperforms existing approaches on baseline exploration environments.

Chat is not available.