Timezone: »
Poster
Minimax Regret for Stochastic Shortest Path
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg
We study the Stochastic Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent has no prior knowledge about the costs and dynamics of the model. She repeatedly interacts with the model for $K$ episodes, and has to minimize her regret. In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space. This matches the $\Omega (\sqrt{ B_\star^2 |S| |A| K})$ lower bound of Rosenberg et al. [2020] for $B_\star \ge 1$, and improves their regret bound by a factor of $\sqrt{|S|}$. For $B_\star < 1$ we prove a matching lower bound of $\Omega (\sqrt{ B_\star |S| |A| K})$. Our algorithm is based on a novel reduction from SSP to finite-horizon MDPs. To that end, we provide an algorithm for the finite-horizon setting whose leading term in the regret depends polynomially on the expected cost of the optimal policy and only logarithmically on the horizon.
Author Information
Alon Cohen (Technion, Google)
Yonathan Efroni (Microsoft Research, New York)
Yishay Mansour (Tel Aviv University & Google)
Aviv Rosenberg (Tel Aviv University)
I am an Applied Scientist at Amazon Alexa Shopping, Tel Aviv. Previously, I obtained my PhD from the department of computer science at Tel Aviv University, where I was fortunate to have Prof. Yishay Mansour as my advisor. Prior to that, I received my Bachelor's degree in Mathematics and Computer Science from Tel Aviv University. My primary research interest lies in theoretical and applied machine learning. More specifically, my PhD focused on data-driven sequential decision making such as reinforcement learning, online learning and multi-armed bandit.
More from the Same Authors
-
2021 Spotlight: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2021 Spotlight: RL for Latent MDPs: Regret Guarantees and a Lower Bound »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2021 : Bandits with Partially Observable Confounded Data »
Guy Tennenholtz · Uri Shalit · Shie Mannor · Yonathan Efroni -
2021 : Reinforcement Learning in Reward-Mixing MDPs »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 : A Theory of Learning with Competing Objectives and User Feedback »
Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri -
2022 : Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information »
Riashat Islam · Manan Tomar · Alex Lamb · Hongyu Zang · Yonathan Efroni · Dipendra Misra · Aniket Didolkar · Xin Li · Harm Van Seijen · Remi Tachet des Combes · John Langford -
2022 : A Theory of Learning with Competing Objectives and User Feedback »
Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri -
2022 : Finding Safe Zones of Markov Decision Processes Policies »
Michal Moshkovitz · Lee Cohen · Yishay Mansour -
2022 : A Theory of Learning with Competing Objectives and User Feedback »
Pranjal Awasthi · Corinna Cortes · Yishay Mansour · Mehryar Mohri -
2022 Poster: Benign Underfitting of Stochastic Gradient Descent »
Tomer Koren · Roi Livni · Yishay Mansour · Uri Sherman -
2022 Poster: A Characterization of Semi-Supervised Adversarially Robust PAC Learnability »
Idan Attias · Steve Hanneke · Yishay Mansour -
2022 Poster: Tractable Optimality in Episodic Latent MABs »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2022 Poster: Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: Fair Wrapping for Black-box Predictions »
Alexander Soen · Ibrahim Alabdulmohsin · Sanmi Koyejo · Yishay Mansour · Nyalleng Moorosi · Richard Nock · Ke Sun · Lexing Xie -
2021 Oral: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2021 Poster: Asynchronous Stochastic Optimization Robust to Arbitrary Delays »
Alon Cohen · Amit Daniely · Yoel Drori · Tomer Koren · Mariano Schain -
2021 Poster: RL for Latent MDPs: Regret Guarantees and a Lower Bound »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2021 Poster: Differentially Private Multi-Armed Bandits in the Shuffle Model »
Jay Tenenbaum · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2021 Poster: ROI Maximization in Stochastic Online Decision-Making »
Nicolò Cesa-Bianchi · Tom Cesari · Yishay Mansour · Vianney Perchet -
2021 Poster: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2021 Poster: Dueling Bandits with Team Comparisons »
Lee Cohen · Ulrike Schmidt-Kraepelin · Yishay Mansour -
2021 Poster: Reinforcement Learning in Reward-Mixing MDPs »
Jeongyeol Kwon · Yonathan Efroni · Constantine Caramanis · Shie Mannor -
2020 : Contributed Talk: Mirror Descent Policy Optimization »
Manan Tomar · Lior Shani · Yonathan Efroni · Mohammad Ghavamzadeh -
2020 Poster: Online Planning with Lookahead Policies »
Yonathan Efroni · Mohammad Ghavamzadeh · Shie Mannor -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Adaptive Trust Region Policy Optimization: Convergence and Faster Rates of regularized MDPs »
Lior Shani · Yonathan Efroni · Shie Mannor -
2019 Poster: Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies »
Yonathan Efroni · Nadav Merlis · Mohammad Ghavamzadeh · Shie Mannor -
2019 Spotlight: Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies »
Yonathan Efroni · Nadav Merlis · Mohammad Ghavamzadeh · Shie Mannor -
2018 Poster: Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2018 Spotlight: Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning »
Yonathan Efroni · Gal Dalal · Bruno Scherrer · Shie Mannor -
2017 Workshop: Learning in the Presence of Strategic Behavior »
Nika Haghtalab · Yishay Mansour · Tim Roughgarden · Vasilis Syrgkanis · Jennifer Wortman Vaughan -
2017 Poster: Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues »
Noga Alon · Moshe Babaioff · Yannai A. Gonczarowski · Yishay Mansour · Shay Moran · Amir Yehudayoff -
2017 Spotlight: Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues »
Noga Alon · Moshe Babaioff · Yannai A. Gonczarowski · Yishay Mansour · Shay Moran · Amir Yehudayoff -
2017 Poster: Multi-Armed Bandits with Metric Movement Costs »
Tomer Koren · Roi Livni · Yishay Mansour