Timezone: »
Poster
A Simple and Optimal Policy Design for Online Learning with Safety against Heavytailed Risk
David SimchiLevi · Zeyu Zheng · Feng Zhu
We consider the classical multiarmed bandit problem and design simpletoimplement new policies that simultaneously enjoy two properties: worstcase optimality for the expected regret, and safety against heavytailed risk for the regret distribution. Recently, Fan and Glynn (2021) showed that informationtheoretic optimized bandit policies as well as standard UCB policies suffer from some serious heavytailed risk; that is, the probability of incurring a linear regret slowly decays at a polynomial rate of $1/T$, as $T$ (the time horizon) increases. Inspired by their result, we further show that any policy that incurs an instancedependent $O(\ln T)$ regret must incur a linear regret with probability $\Omega(\mathrm{poly}(1/T))$ and that the heavytailed risk actually exists for all "instancedependent consistent" policies. Next, for the twoarmed bandit setting, we provide a simple policy design that (i) has the worstcase optimality for the expected regret at order $\tilde O(\sqrt{T})$ and (ii) has the worstcase tail probability of incurring a linear regret decay at an exponential rate $\exp(\Omega(\sqrt{T}))$. We further prove that this exponential decaying rate of the tail probability is optimal across all policies that have worstcase optimality for the expected regret. Finally, we generalize the policy design and analysis to the general setting with an arbitrary $K$ number of arms. We provide detailed characterization of the tail probability bound for any regret threshold under our policy design. Numerical experiments are conducted to illustrate the theoretical findings. Our results reveal insights on the incompatibility between consistency and lighttailed risk, whereas indicate that worstcase optimality on expected regret and lighttailed risk are compatible.
Author Information
David SimchiLevi (MIT)
Zeyu Zheng (University of California Berkeley)
Feng Zhu (Massachusetts Institute of Technology)
More from the Same Authors

2021 : Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation »
Dylan Foster · Akshay Krishnamurthy · David SimchiLevi · Yunzong Xu 
2022 : Mind Your Step: Continuous Conditional GANs with Generator Regularization »
Yunkai Zhang · Yufeng Zheng · Amber Ma · Siyuan Teng · Zeyu Zheng 
2022 Poster: GradientFree Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization »
Tianyi Lin · Zeyu Zheng · Michael Jordan 
2022 Poster: Learning Mixed Multinomial Logits with Provable Guarantees »
Yiqun Hu · David SimchiLevi · Zhenzhen Yan 
2022 Poster: ContextBased Dynamic Pricing with Partially Linear Demand Model »
Jinzhi Bu · David SimchiLevi · Chonghuan Wang 
2021 : Contributed Talk 3: Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation »
Yunzong Xu · Akshay Krishnamurthy · David SimchiLevi 
2021 Poster: Stochastic $L^\natural$convex Function Minimization »
Haixiang Zhang · Zeyu Zheng · Javad Lavaei 
2019 Poster: Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints »
David SimchiLevi · Yunzong Xu 
2018 Poster: The Lingering of Gradients: How to Reuse Gradients Over Time »
Zeyuan AllenZhu · David SimchiLevi · Xinshang Wang