Timezone: »
Beyond No Regret: Instance-Dependent PAC Reinforcement Learning
Andrew Wagenmaker · Kevin Jamieson
Tue Dec 14 09:00 AM -- 10:00 AM (PST) @
The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $\epsilon$-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an $\epsilon$-optimal policy and achieve the worst-case optimal rate, it is unknown whether low-regret algorithms can obtain the instance-optimal rate for policy identification. We show that this is not possible---there exists a fundamental tradeoff between achieving low regret and identifying an $\epsilon$-optimal policy at the instance-optimal rate.
Motivated by our negative finding, we propose a new measure of instance-dependent sample complexity for PAC tabular reinforcement learning which explicitly accounts for the attainable state visitation distributions in the underlying MDP. We then propose and analyze a novel, planning-based algorithm which attains this sample complexity---yielding a complexity which scales with the suboptimality gaps and the ``reachability'' of a state.
We show that our algorithm is nearly minimax optimal, and on several examples that our instance-dependent sample complexity offers significant improvements over worst-case bounds.
Author Information
Andrew Wagenmaker (University of Washington)
Kevin Jamieson (U Washington)
More from the Same Authors
-
2022 Poster: Active Learning with Safety Constraints »
Romain Camilleri · Andrew Wagenmaker · Jamie Morgenstern · Lalit Jain · Kevin Jamieson -
2022 Poster: Instance-optimal PAC Algorithms for Contextual Bandits »
Zhaoqi Li · Lillian Ratliff · houssam nassif · Kevin Jamieson · Lalit Jain -
2022 Poster: Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design »
Andrew Wagenmaker · Kevin Jamieson -
2021 Poster: Selective Sampling for Online Best-arm Identification »
Romain Camilleri · Zhihan Xiong · Maryam Fazel · Lalit Jain · Kevin Jamieson -
2021 Poster: Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers »
Julian Katz-Samuels · Blake Mason · Kevin Jamieson · Rob Nowak -
2021 Poster: Corruption Robust Active Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2020 Poster: An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits »
Julian Katz-Samuels · Lalit Jain · zohar karnin · Kevin Jamieson -
2019 Poster: A New Perspective on Pool-Based Active Classification and False-Discovery Control »
Lalit Jain · Kevin Jamieson -
2019 Poster: Sequential Experimental Design for Transductive Linear Bandits »
Lalit Jain · Kevin Jamieson · Tanner Fiez · Lillian Ratliff -
2019 Poster: Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs »
Max Simchowitz · Kevin Jamieson -
2018 Poster: A Bandit Approach to Sequential Experimental Design with False Discovery Control »
Kevin Jamieson · Lalit Jain