Timezone: »
Poster
Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
Zhihan Xiong · Ruoqi Shen · Qiwen Cui · Maryam Fazel · Simon Du
We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a worst-case $\widetilde{O}\left(H\sqrt{SAT}\right)$ regret bound for episodic time-inhomogeneous Markov Decision Process where $S$ is the size of state space, $A$ is the size of action space, $H$ is the planning horizon and $T$ is the number of interactions. This bound polynomially improves all existing bounds for algorithms based on randomized value functions, and for the first time, matches the $\Omega\left(H\sqrt{SAT}\right)$ lower bound up to logarithmic factors. Our result highlights that randomized exploration can be near-optimal, which was previously achieved only by optimistic algorithms. To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.
Author Information
Zhihan Xiong (University of Washington)
Ruoqi Shen (University of Washington)
Qiwen Cui (Department of Computer Science, University of Washington)
Maryam Fazel (University of Washington)
Simon Du (University of Washington)
More from the Same Authors
-
2022 Poster: Provable General Function Class Representation Learning in Multitask Bandits and MDP »
Rui Lu · Andrew Zhao · Simon Du · Gao Huang -
2022 : Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization »
Runlong Zhou · Yuandong Tian · YI WU · Simon Du -
2022 : On the Global Convergence of the Regularized Generalized Gauss-Newton Algorithm »
Vincent Roulet · Maryam Fazel · Siddhartha Srinivasa · Zaid Harchaoui -
2023 Poster: Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer »
Yuandong Tian · Yiping Wang · Beidi Chen · Simon Du -
2023 Poster: Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure »
Angela Yuan · Chris Junchi Li · Gauthier Gidel · Michael Jordan · Quanquan Gu · Simon Du -
2023 Poster: Active representation learning for general task space with applications in robotics »
Yifang Chen · Yingbing Huang · Simon Du · Kevin Jamieson · Guanya Shi -
2023 Poster: No-Regret Online Prediction with Strategic Experts »
Omid Sadeghi · Maryam Fazel -
2023 Poster: A Reduction-based Framework for Sequential Decision Making with Delayed Feedback »
Yunchang Yang · Han Zhong · Tianhao Wu · Bin Liu · Liwei Wang · Simon Du -
2022 Spotlight: Lightning Talks 4A-4 »
Yunhao Tang · LING LIANG · Thomas Chau · Daeha Kim · Junbiao Cui · Rui Lu · Lei Song · Byung Cheol Song · Andrew Zhao · Remi Munos · Łukasz Dudziak · Jiye Liang · Ke Xue · Kaidi Xu · Mark Rowland · Hongkai Wen · Xing Hu · Xiaobin Huang · Simon Du · Nicholas Lane · Chao Qian · Lei Deng · Bernardo Avila Pires · Gao Huang · Will Dabney · Mohamed Abdelfattah · Yuan Xie · Marc Bellemare -
2022 Spotlight: Provable General Function Class Representation Learning in Multitask Bandits and MDP »
Rui Lu · Andrew Zhao · Simon Du · Gao Huang -
2022 Poster: When are Offline Two-Player Zero-Sum Markov Games Solvable? »
Qiwen Cui · Simon Du -
2022 Poster: Sampling with Riemannian Hamiltonian Monte Carlo in a Constrained Space »
Yunbum Kook · Yin-Tat Lee · Ruoqi Shen · Santosh Vempala -
2022 Poster: Learning in Congestion Games with Bandit Feedback »
Qiwen Cui · Zhihan Xiong · Maryam Fazel · Simon Du -
2022 Poster: Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus »
Qiwen Cui · Simon Du -
2022 Poster: On Gap-dependent Bounds for Offline Reinforcement Learning »
Xinqi Wang · Qiwen Cui · Simon Du -
2021 Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning? »
Manfred Díaz · Hiroki Furuta · Elise van der Pol · Lisa Lee · Shixiang (Shane) Gu · Pablo Samuel Castro · Simon Du · Marc Bellemare · Sergey Levine -
2021 Poster: Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions »
Yin Tat Lee · Ruoqi Shen · Kevin Tian -
2021 Poster: Selective Sampling for Online Best-arm Identification »
Romain Camilleri · Zhihan Xiong · Maryam Fazel · Lalit Jain · Kevin Jamieson -
2021 Oral: Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions »
Yin Tat Lee · Ruoqi Shen · Kevin Tian -
2020 Poster: Generalized Leverage Score Sampling for Neural Networks »
Jason Lee · Ruoqi Shen · Zhao Song · Mengdi Wang · zheng Yu -
2019 Poster: The Randomized Midpoint Method for Log-Concave Sampling »
Ruoqi Shen · Yin Tat Lee -
2019 Spotlight: The Randomized Midpoint Method for Log-Concave Sampling »
Ruoqi Shen · Yin Tat Lee -
2012 Poster: Structured learning of Gaussian graphical models »
Karthik Mohan · Michael J Chung · Seungyeop Han · Daniela Witten · Su-In Lee · Maryam Fazel