Timezone: »
Poster
Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function
Aviv Rosenberg · Yishay Mansour
Thu Dec 12 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #23
We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes.
The transition function is fixed but unknown to the learner, and the learner only observes bandit feedback (not the entire loss function).
For this problem we develop no-regret algorithms that perform asymptotically as well as the best stationary policy in hindsight.
Assuming that all states are reachable with probability $\beta > 0$ under any policy, we give a regret bound of $\tilde{O} ( L|X|\sqrt{|A|T} / \beta )$, where $T$ is the number of episodes, $X$ is the state space, $A$ is the action space, and $L$ is the length of each episode.
When this assumption is removed we give a regret bound of $\tilde{O} ( L^{3/2} |X| |A|^{1/4} T^{3/4})$, that holds for an arbitrary transition function.
To our knowledge these are the first algorithms that in our setting handle both bandit feedback and an unknown transition function.
Author Information
Aviv Rosenberg (Tel Aviv University)
I am an Applied Scientist at Amazon Alexa Shopping, Tel Aviv. Previously, I obtained my PhD from the department of computer science at Tel Aviv University, where I was fortunate to have Prof. Yishay Mansour as my advisor. Prior to that, I received my Bachelor's degree in Mathematics and Computer Science from Tel Aviv University. My primary research interest lies in theoretical and applied machine learning. More specifically, my PhD focused on data-driven sequential decision making such as reinforcement learning, online learning and multi-armed bandit.
Yishay Mansour (Tel Aviv University / Google)
More from the Same Authors
-
2022 : Finding Safe Zones of Markov Decision Processes Policies »
Lee Cohen · Yishay Mansour · Michal Moshkovitz -
2023 Poster: Multiclass Boosting: Simple and Intuitive Weak Learning Criteria »
Nataly Brukhim · Amit Daniely · Yishay Mansour · Shay Moran -
2023 Poster: Finding Safe Zones of Markov Decision Processes Policies »
Lee Cohen · Yishay Mansour · Michal Moshkovitz -
2023 Poster: Blackbox Differential Privacy for Interactive ML »
Haim Kaplan · Yishay Mansour · Shay Moran · Kobbi Nissim · Uri Stemmer -
2023 Poster: Provably Efficient Personalized Multi-Objective Decision Making via Comparative Feedback »
Han Shao · Lee Cohen · Avrim Blum · Yishay Mansour · Aadirupa Saha · Matthew Walter -
2022 Poster: Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2021 Workshop: Learning in Presence of Strategic Behavior »
Omer Ben-Porat · Nika Haghtalab · Annie Liang · Yishay Mansour · David Parkes -
2021 Poster: Minimax Regret for Stochastic Shortest Path »
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg -
2021 Poster: Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2020 Poster: Sample Complexity of Uniform Convergence for Multicalibration »
Eliran Shabat · Lee Cohen · Yishay Mansour -
2020 Poster: Prediction with Corrupted Expert Advice »
Idan Amir · Idan Attias · Tomer Koren · Yishay Mansour · Roi Livni -
2020 Spotlight: Prediction with Corrupted Expert Advice »
Idan Amir · Idan Attias · Tomer Koren · Yishay Mansour · Roi Livni -
2020 Poster: Adversarially Robust Streaming Algorithms via Differential Privacy »
Avinatan Hassidim · Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2020 Poster: Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity »
Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2020 Oral: Adversarially Robust Streaming Algorithms via Differential Privacy »
Avinatan Hassidim · Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 1 »
David Brandfonbrener · Joan Bruna · Tom Zahavy · Haim Kaplan · Yishay Mansour · Nikos Karampatziakis · John Langford · Paul Mineiro · Donghwan Lee · Niao He -
2019 Poster: Graph-based Discriminators: Sample Complexity and Expressiveness »
Roi Livni · Yishay Mansour -
2019 Spotlight: Graph-based Discriminators: Sample Complexity and Expressiveness »
Roi Livni · Yishay Mansour -
2019 Poster: Learning to Screen »
Alon Cohen · Avinatan Hassidim · Haim Kaplan · Yishay Mansour · Shay Moran -
2019 Poster: Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits »
Yogev Bar-On · Yishay Mansour -
2016 : Robust Learning and Inference »
Yishay Mansour -
2016 Poster: Online Pricing with Strategic and Patient Buyers »
Michal Feldman · Tomer Koren · Roi Livni · Yishay Mansour · Aviv Zohar -
2013 Poster: From Bandits to Experts: A Tale of Domination and Independence »
Noga Alon · Nicolò Cesa-Bianchi · Claudio Gentile · Yishay Mansour -
2013 Oral: From Bandits to Experts: A Tale of Domination and Independence »
Noga Alon · Nicolò Cesa-Bianchi · Claudio Gentile · Yishay Mansour