Timezone: »
Poster
Near-Optimal Sample Complexity Bounds for Constrained MDPs
Sharan Vaswani · Lin Yang · Csaba Szepesvari
In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an $\epsilon$-optimal policy with probability $1 - \delta$, by making $\tilde{O}\left(\frac{S A \log(1/\delta)}{(1 - \gamma)^3 \epsilon^2}\right)$ queries to the generative model, thus matching the sample-complexity for unconstrained MDPs. For (ii), we show that the algorithm's sample complexity is upper-bounded by $\tilde{O} \left(\frac{S A \, \log(1/\delta)}{(1 - \gamma)^5 \, \epsilon^2 \zeta^2} \right)$ where $\zeta$ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs. Our results show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.
Author Information
Sharan Vaswani (Simon Fraser University)
Lin Yang (UCLA)
Csaba Szepesvari (University of Alberta)
More from the Same Authors
-
2021 : Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent »
Sharan Vaswani · Benjamin Dubois-Taine · Reza Babanezhad Harikandeh -
2021 : Doubly Pessimistic Algorithms for Strictly Safe Off-Policy Optimization »
Sanae Amani · Lin Yang -
2022 : Target-based Surrogates for Stochastic Optimization »
Jonathan Lavington · Sharan Vaswani · Reza Babanezhad Harikandeh · Mark Schmidt · Nicolas Le Roux -
2022 : From Local to Global: Spectral-Inspired Graph Neural Networks »
Ningyuan Huang · Soledad Villar · Carey E Priebe · Da Zheng · Chengyue Huang · Lin Yang · Vladimir Braverman -
2023 Poster: Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL »
Qinghua Liu · Gellért Weisz · András György · Chi Jin · Csaba Szepesvari -
2023 Poster: Ordering-based Conditions for Global Convergence of Policy Gradient Methods »
Jincheng Mei · Bo Dai · Alekh Agarwal · Mohammad Ghavamzadeh · Csaba Szepesvari · Dale Schuurmans -
2023 Poster: Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds »
Jiayi Huang · Han Zhong · Liwei Wang · Lin Yang -
2023 Poster: Regret Minimization via Saddle Point Optimization »
Johannes Kirschner · Alireza Bakhtiari · Kushagra Chandak · Volodymyr Tkachuk · Csaba Szepesvari -
2023 Poster: Efficient Batched Algorithm for Contextual Linear Bandits with Large Action Space via Soft Elimination »
Osama Hanna · Lin Yang · Christina Fragouli -
2023 Poster: Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore »
Gellért Weisz · András György · Csaba Szepesvari -
2023 Poster: Context-lumpable stochastic bandits »
Chung-Wei Lee · Qinghua Liu · Yasin Abbasi Yadkori · Chi Jin · Tor Lattimore · Csaba Szepesvari -
2023 Poster: Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees »
Sharan Vaswani · Amirreza Kazemi · Reza Babanezhad Harikandeh · Nicolas Le Roux -
2023 Poster: Replicability in Reinforcement Learning »
Amin Karbasi · Grigoris Velegkas · Lin Yang · Felix Zhou -
2022 : Poster Session 1 »
Andrew Lowy · Thomas Bonnier · Yiling Xie · Guy Kornowski · Simon Schug · Seungyub Han · Nicolas Loizou · xinwei zhang · Laurent Condat · Tabea E. Röber · Si Yi Meng · Marco Mondelli · Runlong Zhou · Eshaan Nichani · Adrian Goldwaser · Rudrajit Das · Kayhan Behdin · Atish Agarwala · Mukul Gagrani · Gary Cheng · Tian Li · Haoran Sun · Hossein Taheri · Allen Liu · Siqi Zhang · Dmitrii Avdiukhin · Bradley Brown · Miaolan Xie · Junhyung Lyle Kim · Sharan Vaswani · Xinmeng Huang · Ganesh Ramachandra Kini · Angela Yuan · Weiqiang Zheng · Jiajin Li -
2022 Poster: Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning »
Dingwen Kong · Lin Yang -
2022 Poster: The Role of Baselines in Policy Gradient Optimization »
Jincheng Mei · Wesley Chung · Valentin Thomas · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2022 Poster: Sample-Efficient Reinforcement Learning of Partially Observable Markov Games »
Qinghua Liu · Csaba Szepesvari · Chi Jin -
2022 Poster: Learning from Distributed Users in Contextual Linear Bandits Without Sharing the Context »
Osama Hanna · Lin Yang · Christina Fragouli -
2022 Poster: Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs »
Gellért Weisz · András György · Tadashi Kozuno · Csaba Szepesvari -
2022 Poster: Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization »
Hui Yuan · Chengzhuo Ni · Huazheng Wang · Xuezhou Zhang · Le Cong · Csaba Szepesvari · Mengdi Wang -
2021 : Poster Session 1 (gather.town) »
Hamed Jalali · Robert Hönig · Maximus Mutschler · Manuel Madeira · Abdurakhmon Sadiev · Egor Shulgin · Alasdair Paren · Pascal Esser · Simon Roburin · Julius Kunze · Agnieszka Słowik · Frederik Benzing · Futong Liu · Hongyi Li · Ryotaro Mitsuboshi · Grigory Malinovsky · Jayadev Naram · Zhize Li · Igor Sokolov · Sharan Vaswani -
2021 Poster: Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs »
Han Zhong · Jiayi Huang · Lin Yang · Liwei Wang -
2021 Poster: On the Value of Interaction and Function Approximation in Imitation Learning »
Nived Rajaraman · Yanjun Han · Lin Yang · Jingbo Liu · Jiantao Jiao · Kannan Ramchandran -
2021 Poster: Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2020 : Poster Session 2 (gather.town) »
Sharan Vaswani · Nicolas Loizou · Wenjie Li · Preetum Nakkiran · Zhan Gao · Sina Baghal · Jingfeng Wu · Roozbeh Yousefzadeh · Jinyi Wang · Jing Wang · Cong Xie · Anastasia Borovykh · Stanislaw Jastrzebski · Soham Dan · Yiliang Zhang · Mark Tuddenham · Sarath Pattathil · Ievgen Redko · Jeremy Cohen · Yasaman Esfandiari · Zhanhong Jiang · Mostafa ElAraby · Chulhee Yun · Michael Psenka · Robert Gower · Xiaoyu Wang -
2020 : Contributed talks in Session 2 (Zoom) »
Martin Takac · Samuel Horváth · Guan-Horng Liu · Nicolas Loizou · Sharan Vaswani -
2020 : Contributed Video: Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search), Sharan Vaswani »
Sharan Vaswani -
2020 : Contributed Video: How to make your optimizer generalize better, Sharan Vaswani »
Sharan Vaswani -
2020 Poster: Planning with General Objective Functions: Going Beyond Total Rewards »
Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang -
2020 Poster: Is Long Horizon RL More Difficult Than Short Horizon RL? »
Ruosong Wang · Simon Du · Lin Yang · Sham Kakade -
2020 Poster: Toward the Fundamental Limits of Imitation Learning »
Nived Rajaraman · Lin Yang · Jiantao Jiao · Kannan Ramchandran -
2020 Poster: Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning? »
Qiwen Cui · Lin Yang -
2020 Poster: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski -
2020 Spotlight: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski -
2020 Poster: On Reward-Free Reinforcement Learning with Linear Function Approximation »
Ruosong Wang · Simon Du · Lin Yang · Russ Salakhutdinov -
2020 Poster: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2020 Poster: Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension »
Ruosong Wang · Russ Salakhutdinov · Lin Yang -
2020 Poster: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Poster: Efficient Symmetric Norm Regression via Linear Sketching »
Zhao Song · Ruosong Wang · Lin Yang · Hongyang Zhang · Peilin Zhong -
2019 Poster: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates »
Sharan Vaswani · Aaron Mishkin · Issam Laradji · Mark Schmidt · Gauthier Gidel · Simon Lacoste-Julien