Timezone: »
Poster
The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle
Fang Kong · Yueran Yang · Wei Chen · Shuai Li
Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example \cite{WangC18} has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance. It is still an open question whether the convergence analysis of TS can be extended beyond the exact oracle in CMAB. In this paper, we study this question under the greedy oracle, which is a common (approximation) oracle with theoretical guarantees to solve many (offline) combinatorial optimization problems. We provide a problem-dependent regret lower bound of order $\Omega(\log T/\Delta^2)$ to quantify the hardness of TS to solve CMAB problems with greedy oracle, where $T$ is the time horizon and $\Delta$ is some reward gap. We also provide an almost matching regret upper bound. These are the first theoretical results for TS to solve CMAB with a common approximation oracle and break the misconception that TS cannot work with approximation oracles.
Author Information
Fang Kong (Shanghai Jiao Tong University)
Yueran Yang (Shanghai Jiao Tong University)
Wei Chen (Microsoft Research)
Shuai Li (Shanghai Jiao Tong University)
More from the Same Authors
-
2021 : User-in-the-Loop Named Entity Recognition via Counterfactual Learning »
Tong Yu · Junda Wu · Ruiyi Zhang · Handong Zhao · Shuai Li -
2021 : Sim-to-Real Interactive Recommendation via Off-Dynamics Reinforcement Learning »
Junda Wu · Zhihui Xie · Tong Yu · Qizhi Li · Shuai Li -
2022 Poster: Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret »
Jiawei Huang · Li Zhao · Tao Qin · Wei Chen · Nan Jiang · Tie-Yan Liu -
2023 Poster: Scalable Fair Influence Maximization »
Xiaobin Rui · Zhixiao Wang · Jiayu Zhao · Lichao Sun · Wei Chen -
2023 Poster: Online Corrupted User Detection and Regret Minimization »
Zhiyong Wang · Jize Xie · Tong Yu · Shuai Li · John C.S. Lui -
2023 Poster: Multi-Fidelity Multi-Armed Bandits Revisited »
Xuchuang Wang · Qingyun Wu · Wei Chen · John C.S. Lui -
2023 Poster: InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding »
Junda Wu · Tong Yu · Rui Wang · Zhao Song · Ruiyi Zhang · Handong Zhao · Chaochao Lu · Shuai Li · Ricardo Henao -
2023 Poster: Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback »
Canzhe Zhao · Ruofeng Yang · Baoxiang Wang · Xuezhou Zhang · Shuai Li -
2023 Poster: Adversarial Attacks on Online Learning to Rank with Click Feedback »
Jinhang Zuo · Zhiyao Zhang · Zhiyong Wang · Shuai Li · Mohammad Hajiesmaili · Adam Wierman -
2023 Poster: Online Clustering of Bandits with Misspecified User Models »
Zhiyong Wang · Jize Xie · Xutong Liu · Shuai Li · John C.S. Lui -
2022 Spotlight: Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret »
Jiawei Huang · Li Zhao · Tao Qin · Wei Chen · Nan Jiang · Tie-Yan Liu -
2022 Spotlight: Lightning Talks 4A-1 »
Jiawei Huang · Su Jia · Abdurakhmon Sadiev · Ruomin Huang · Yuanyu Wan · Denizalp Goktas · Jiechao Guan · Andrew Li · Wei-Wei Tu · Li Zhao · Amy Greenwald · Jiawei Huang · Dmitry Kovalev · Yong Liu · Wenjie Liu · Peter Richtarik · Lijun Zhang · Zhiwu Lu · R Ravi · Tao Qin · Wei Chen · Hu Ding · Nan Jiang · Tie-Yan Liu -
2022 Poster: Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms »
Xutong Liu · Jinhang Zuo · Siwei Wang · Carlee Joe-Wong · John C.S. Lui · Wei Chen -
2021 Poster: Understanding Bandits with Graph Feedback »
Houshuang Chen · zengfeng Huang · Shuai Li · Chihao Zhang -
2021 Poster: Combinatorial Pure Exploration with Bottleneck Reward Function »
Yihan Du · Yuko Kuroki · Wei Chen -
2020 Poster: Online Influence Maximization under Linear Threshold Model »
Shuai Li · Fang Kong · Kejie Tang · Qizhi Li · Wei Chen -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 Poster: Adaptive Influence Maximization with Myopic Feedback »
Binghui Peng · Wei Chen -
2018 Poster: TopRank: A practical algorithm for online stochastic ranking »
Tor Lattimore · Branislav Kveton · Shuai Li · Csaba Szepesvari -
2018 Poster: Community Exploration: From Offline Optimization to Online Learning »
Xiaowei Chen · Weiran Huang · Wei Chen · John C. S. Lui -
2017 Poster: Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications »
Qinshi Wang · Wei Chen -
2017 Poster: Influence Maximization with $\varepsilon$-Almost Submodular Threshold Functions »
Qiang Li · Wei Chen · Institute of Computing Xiaoming Sun · Institute of Computing Jialin Zhang -
2016 Poster: Combinatorial Multi-Armed Bandit with General Reward Functions »
Wei Chen · Wei Hu · Fu Li · Jian Li · Yu Liu · Pinyan Lu -
2015 Poster: Stochastic Online Greedy Learning with Semi-bandit Feedbacks »
Tian Lin · Jian Li · Wei Chen -
2014 Poster: Combinatorial Pure Exploration of Multi-Armed Bandits »
Shouyuan Chen · Tian Lin · Irwin King · Michael R Lyu · Wei Chen -
2014 Oral: Combinatorial Pure Exploration of Multi-Armed Bandits »
Shouyuan Chen · Tian Lin · Irwin King · Michael R Lyu · Wei Chen