Skip to yearly menu bar Skip to main content


Talk
in
Workshop: OPT2020: Optimization for Machine Learning

Contributed Video: Employing No Regret Learners for Pure Exploration in Linear Bandits, Mohammadi Zaki

Mohammadi Zaki


Abstract: We study the best arm identification problem in linear multi-armed bandits (LMAB) in the fixed confidence (δδ-PAC) setting; this is also the problem of optimizing an unknown linear function over a discrete ground set with noisy, zeroth-order access. We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem. Most previous approaches rely on access to a minimax optimization oracle which is at the heart of the complexity of the problem. We propose a method to solve this optimization problem (upto suitable accuracy) by interpreting the problem as a two-player zero-sum game, and attempting to sequentially converge to its saddle point using low-regret learners to compute the players' strategies in each round which yields a concrete querying algorithm. The algorithm, which we call the {\em Phased Elimination Linear Exploration Game} (PELEG), maintains a high-probability confidence ellipsoid containing θθ in each round and uses it to eliminate suboptimal arms in phases. We analyze the sample complexity of PELEG and show that it matches, up to order, an instance-dependent lower bound on sample complexity in the linear bandit setting without requiring boundedness assumptions on the parameter space. PELEG is, thus, the first algorithm to achieve both order-optimal sample complexity and explicit implementability for this setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees.