Timezone: »
Poster
Constrained episodic reinforcement learning in concave-convex and knapsack settings
Kianté Brantley · Miro Dudik · Thodoris Lykouris · Sobhan Miryoosefi · Max Simchowitz · Aleksandrs Slivkins · Wen Sun
We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either the feasibility question or settings with a single episode. Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
Author Information
Kianté Brantley (The University of Maryland College Park)
Miro Dudik (Microsoft Research)
Thodoris Lykouris (Microsoft Research NYC)
Sobhan Miryoosefi (Princeton University)
Max Simchowitz (Berkeley)
Aleksandrs Slivkins (Microsoft Research)
Wen Sun (Microsoft Research NYC)
More from the Same Authors
-
2021 Spotlight: Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms »
Chi Jin · Qinghua Liu · Sobhan Miryoosefi -
2021 Spotlight: Bayesian decision-making under misspecified priors with applications to meta-learning »
Max Simchowitz · Christopher Tosh · Akshay Krishnamurthy · Daniel Hsu · Thodoris Lykouris · Miro Dudik · Robert Schapire -
2021 : Exploration and Incentives in Reinforcement Learning »
Max Simchowitz · Aleksandrs Slivkins -
2021 : The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity »
Mark Sellke · Aleksandrs Slivkins -
2021 : Exploration and Incentives in Reinforcement Learning »
Max Simchowitz · Aleksandrs Slivkins -
2021 : The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity »
Mark Sellke · Aleksandrs Slivkins -
2022 : $\ell$Gym: Natural Language Visual Reasoning with Reinforcement Learning »
Anne Wu · Kianté Brantley · Noriyuki Kojima · Yoav Artzi -
2022 : Learning to Extrapolate: A Transductive Approach »
Aviv Netanyahu · Abhishek Gupta · Max Simchowitz · Kaiqing Zhang · Pulkit Agrawal -
2023 Poster: A Unified Model and Dimension for Interactive Estimation »
Nataly Brukhim · Miro Dudik · Aldo Pacchiano · Robert Schapire -
2023 Poster: Bandit Social Learning under Myopic Behavior »
Kiarash Banihashem · MohammadTaghi Hajiaghayi · Suho Shin · Aleksandrs Slivkins -
2023 Poster: Quantifying the Cost of Learning in Queueing Systems »
Daniel Freund · Thodoris Lykouris · Tsui-Wei Weng -
2022 Workshop: InterNLP: Workshop on Interactive Learning for Natural Language Processing »
Kianté Brantley · Soham Dan · Ji Ung Lee · Khanh Nguyen · Edwin Simpson · Alane Suhr · Yoav Artzi -
2022 Poster: Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions »
Adam Block · Max Simchowitz -
2022 Poster: Globally Convergent Policy Search for Output Estimation »
Jack Umenberger · Max Simchowitz · Juan Perdomo · Kaiqing Zhang · Russ Tedrake -
2022 Poster: Incentivizing Combinatorial Bandit Exploration »
Xinyan Hu · Dung Ngo · Aleksandrs Slivkins · Steven Wu -
2022 Poster: Provably sample-efficient RL with side information about latent dynamics »
Yao Liu · Dipendra Misra · Miro Dudik · Robert Schapire -
2021 : Spotlight 1: Exploration and Incentives in Reinforcement Learning »
Max Simchowitz · Aleksandrs Slivkins -
2021 Poster: Online Control of Unknown Time-Varying Dynamical Systems »
Edgar Minasyan · Paula Gradu · Max Simchowitz · Elad Hazan -
2021 Poster: Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms »
Chi Jin · Qinghua Liu · Sobhan Miryoosefi -
2021 Poster: Stabilizing Dynamical Systems via Policy Gradient Methods »
Juan Perdomo · Jack Umenberger · Max Simchowitz -
2021 Poster: Bayesian decision-making under misspecified priors with applications to meta-learning »
Max Simchowitz · Christopher Tosh · Akshay Krishnamurthy · Daniel Hsu · Thodoris Lykouris · Miro Dudik · Robert Schapire -
2021 Poster: Bandits with Knapsacks beyond the Worst Case »
Karthik Abinav Sankararaman · Aleksandrs Slivkins -
2020 Poster: Making Non-Stochastic Control (Almost) as Easy as Stochastic »
Max Simchowitz -
2020 Session: Orals & Spotlights Track 20: Social/Adversarial Learning »
Steven Wu · Miro Dudik -
2020 Poster: Efficient Contextual Bandits with Continuous Actions »
Maryam Majzoubi · Chicheng Zhang · Rajan Chari · Akshay Krishnamurthy · John Langford · Aleksandrs Slivkins -
2020 Poster: Learning the Linear Quadratic Regulator from Nonlinear Observations »
Zakaria Mhammedi · Dylan Foster · Max Simchowitz · Dipendra Misra · Wen Sun · Akshay Krishnamurthy · Alexander Rakhlin · John Langford -
2019 Poster: Reinforcement Learning with Convex Constraints »
Sobhan Miryoosefi · Kianté Brantley · Hal Daumé III · Miro Dudik · Robert Schapire -
2019 Poster: Policy Poisoning in Batch Reinforcement Learning and Control »
Yuzhe Ma · Xuezhou Zhang · Wen Sun · Jerry Zhu -
2019 Poster: Optimal Sketching for Kronecker Product Regression and Low Rank Approximation »
Huaian Diao · Rajesh Jayaram · Zhao Song · Wen Sun · David Woodruff -
2019 Poster: Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs »
Max Simchowitz · Kevin Jamieson -
2017 : The Unfair Externalities of Exploration »
Aleksandrs Slivkins · Jennifer Wortman Vaughan -
2017 Poster: Off-policy evaluation for slate recommendation »
Adith Swaminathan · Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik · John Langford · Damien Jose · Imed Zitouni -
2017 Poster: A Decomposition of Forecast Error in Prediction Markets »
Miro Dudik · Sebastien Lahaie · Ryan Rogers · Jennifer Wortman Vaughan -
2017 Oral: Off-policy evaluation for slate recommendation »
Adith Swaminathan · Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik · John Langford · Damien Jose · Imed Zitouni -
2016 Poster: Contextual semibandits via supervised learning oracles »
Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2011 Poster: Multi-armed bandits on implicit metric spaces »
Aleksandrs Slivkins -
2009 Poster: Adapting to the Shifting Intent of Search Queries »
Umar Syed · Aleksandrs Slivkins · Nina Mishra