Timezone: »
Poster
Navigating to the Best Policy in Markov Decision Processes
Aymen Al Marjani · Aurélien Garivier · Alexandre Proutiere
We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We propose a problem-dependent lower bound on the average number of steps required before a correct answer can be given with probability at least $1-\delta$. We further provide the first algorithm with an instance-specific sample complexity in this setting. This algorithm addresses the general case of communicating MDPs; we also propose a variant with a reduced exploration rate (and hence faster convergence) under an additional ergodicity assumption. This work extends previous results relative to the \emph{generative setting}~\cite{pmlr-v139-marjani21a}, where the agent could at each step query the random outcome of any (state, action) pair. In contrast, we show here how to deal with the \emph{navigation constraints}, induced by the \emph{online setting}. Our analysis relies on an ergodic theorem for non-homogeneous Markov chains which we consider of wide interest in the analysis of Markov Decision Processes.
Author Information
Aymen Al Marjani (ENS Lyon)
Aurélien Garivier (ENS Lyon)
Alexandre Proutiere (KTH)
More from the Same Authors
-
2021 Spotlight: Sequential Algorithms for Testing Closeness of Distributions »
Aadil Oufkir · Omar Fawzi · Nicolas Flammarion · Aurélien Garivier -
2022 Poster: Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs »
Andrea Tirinzoni · Aymen Al Marjani · Emilie Kaufmann -
2021 Poster: Sequential Algorithms for Testing Closeness of Distributions »
Aadil Oufkir · Omar Fawzi · Nicolas Flammarion · Aurélien Garivier -
2021 Poster: Fast Pure Exploration via Frank-Wolfe »
Po-An Wang · Ruo-Chun Tzeng · Alexandre Proutiere -
2021 Poster: A/B/n Testing with Control in the Presence of Subpopulations »
Yoan Russac · Christina Katsimerou · Dennis Bohle · Olivier Cappé · Aurélien Garivier · Wouter Koolen -
2020 Poster: Regret in Online Recommendation Systems »
Kaito Ariu · Narae Ryu · Se-Young Yun · Alexandre Proutiere -
2020 Poster: Optimal Best-arm Identification in Linear Bandits »
Yassir Jedra · Alexandre Proutiere -
2019 Poster: Optimal Sampling and Clustering in the Stochastic Block Model »
Se-Young Yun · Alexandre Proutiere -
2018 Poster: Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling »
Emilie Kaufmann · Wouter Koolen · Aurélien Garivier -
2018 Poster: Exploration in Structured Reinforcement Learning »
Jungseul Ok · Alexandre Proutiere · Damianos Tranos -
2018 Oral: Exploration in Structured Reinforcement Learning »
Jungseul Ok · Alexandre Proutiere · Damianos Tranos -
2017 Poster: Minimal Exploration in Structured Stochastic Bandits »
Richard Combes · Stefan Magureanu · Alexandre Proutiere -
2017 Spotlight: Minimal Exploration in Structured Stochastic Bandits »
Richard Combes · Stefan Magureanu · Alexandre Proutiere -
2016 Poster: Optimal Cluster Recovery in the Labeled Stochastic Block Model »
Se-Young Yun · Alexandre Proutiere -
2015 Poster: Fast and Memory Optimal Low-Rank Matrix Approximation »
Se-Young Yun · marc lelarge · Alexandre Proutiere -
2015 Poster: Combinatorial Bandits Revisited »
Richard Combes · M. Sadegh Talebi · Alexandre Proutiere · marc lelarge -
2014 Poster: Streaming, Memory Limited Algorithms for Community Detection »
Se-Young Yun · marc lelarge · Alexandre Proutiere -
2013 Poster: Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards »
Thomas Bonald · Alexandre Proutiere