Timezone: »
Poster
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin · Longbo Huang · Haipeng Luo
We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through $T$ episodes, with the goal of achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ regret when the losses are adversarial and simultaneously $\mathcal{O}(\log T)$ regret when the losses are (almost) stochastic. Recent work by [Jin and Luo, 2020] achieves this goal when the fixed transition is known, and leaves the case of unknown transition as a major open question. In this work, we resolve this open problem by using the same Follow-the-Regularized-Leader (FTRL) framework together with a set of new techniques. Specifically, we first propose a loss-shifting trick in the FTRL analysis, which greatly simplifies the approach of [Jin and Luo, 2020] and already improves their results for the known transition case. Then, we extend this idea to the unknown transition case and develop a novel analysis which upper bounds the transition estimation error by the regret itself in the stochastic setting, a key property to ensure $\mathcal{O}(\log T)$ regret.
Author Information
Tiancheng Jin (University of Southern California)
Longbo Huang (IIIS, Tsinghua Univeristy)
Haipeng Luo (University of Southern California)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tue. Dec 7th 09:40 -- 09:55 AM Room
More from the Same Authors
-
2021 : Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2022 Poster: Provable Generalization of Overparameterized Meta-learning Trained with SGD »
Yu Huang · Yingbin Liang · Longbo Huang -
2022 : Why (and When) does Local SGD Generalize Better than SGD? »
Xinran Gu · Kaifeng Lyu · Longbo Huang · Sanjeev Arora -
2022 : Clairvoyant Regret Minimization: Equivalence with Nemirovski’s Conceptual Prox Method and Extension to General Convex Games »
Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo -
2022 : Online Min-max Optimization: Nonconvexity, Nonstationarity, and Dynamic Regret »
Yu Huang · Yuan Cheng · Yingbin Liang · Longbo Huang -
2022 Spotlight: Lightning Talks 4A-2 »
Barakeel Fanseu Kamhoua · Hualin Zhang · Taiki Miyagawa · Tomoya Murata · Xin Lyu · Yan Dai · Elena Grigorescu · Zhipeng Tu · Lijun Zhang · Taiji Suzuki · Wei Jiang · Haipeng Luo · Lin Zhang · Xi Wang · Young-San Lin · Huan Xiong · Liyu Chen · Bin Gu · Jinfeng Yi · Yongqiang Chen · Sandeep Silwal · Yiguang Hong · Maoyuan Song · Lei Wang · Tianbao Yang · Han Yang · MA Kaili · Samson Zhou · Deming Yuan · Bo Han · Guodong Shi · Bo Li · James Cheng -
2022 Spotlight: Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback »
Yan Dai · Haipeng Luo · Liyu Chen -
2022 Spotlight: Lightning Talks 3B-2 »
Yu Huang · Tero Karras · Maxim Kodryan · Shiau Hong Lim · Shudong Huang · Ziyu Wang · Siqiao Xue · ILYAS MALIK · Ekaterina Lobacheva · Miika Aittala · Hongjie Wu · Yuhao Zhou · Yingbin Liang · Xiaoming Shi · Jun Zhu · Maksim Nakhodnov · Timo Aila · Yazhou Ren · James Zhang · Longbo Huang · Dmitry Vetrov · Ivor Tsang · Hongyuan Mei · Samuli Laine · Zenglin Xu · Wentao Feng · Jiancheng Lv -
2022 Spotlight: Provable Generalization of Overparameterized Meta-learning Trained with SGD »
Yu Huang · Yingbin Liang · Longbo Huang -
2022 Poster: Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments »
Liyu Chen · Haipeng Luo -
2022 Poster: Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games »
Ioannis Anagnostides · Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo · Tuomas Sandholm -
2022 Poster: Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback »
Yan Dai · Haipeng Luo · Liyu Chen -
2022 Poster: Near-Optimal No-Regret Learning Dynamics for General Convex Games »
Gabriele Farina · Ioannis Anagnostides · Haipeng Luo · Chung-Wei Lee · Christian Kroer · Tuomas Sandholm -
2021 Poster: Multi-Agent Reinforcement Learning in Stochastic Networked Systems »
Yiheng Lin · Guannan Qu · Longbo Huang · Adam Wierman -
2021 Poster: Last-iterate Convergence in Extensive-Form Games »
Chung-Wei Lee · Christian Kroer · Haipeng Luo -
2021 Poster: Continuous Mean-Covariance Bandits »
Yihan Du · Siwei Wang · Zhixuan Fang · Longbo Huang -
2021 Poster: Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia-Jahromi · Rahul Jain · Haipeng Luo -
2021 Poster: Fast Federated Learning in the Presence of Arbitrary Device Unavailability »
Xinran Gu · Kaixuan Huang · Jingzhao Zhang · Longbo Huang -
2021 Poster: What Makes Multi-Modal Learning Better than Single (Provably) »
Yu Huang · Chenzhuang Du · Zihui Xue · Xuanyao Chen · Hang Zhao · Longbo Huang -
2021 Poster: Regularized Softmax Deep Multi-Agent Q-Learning »
Ling Pan · Tabish Rashid · Bei Peng · Longbo Huang · Shimon Whiteson -
2021 Poster: Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2020 Poster: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Poster: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition »
Tiancheng Jin · Haipeng Luo -
2020 Spotlight: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition »
Tiancheng Jin · Haipeng Luo -
2020 Oral: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Poster: Softmax Deep Double Deterministic Policy Gradients »
Ling Pan · Qingpeng Cai · Longbo Huang -
2020 Poster: Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits »
Siwei Wang · Longbo Huang · John C. S. Lui -
2020 Poster: Comparator-Adaptive Convex Bandits »
Dirk van der Hoeven · Ashok Cutkosky · Haipeng Luo -
2019 Poster: Equipping Experts/Bandits with Long-term Memory »
Kai Zheng · Haipeng Luo · Ilias Diakonikolas · Liwei Wang -
2019 Poster: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Spotlight: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Poster: Double Quantization for Communication-Efficient Distributed Optimization »
Yue Yu · Jiaxiang Wu · Longbo Huang -
2019 Poster: Hypothesis Set Stability and Generalization »
Dylan Foster · Spencer Greenberg · Satyen Kale · Haipeng Luo · Mehryar Mohri · Karthik Sridharan -
2018 Poster: Multi-armed Bandits with Compensation »
Siwei Wang · Longbo Huang