Timezone: »
The multiarmed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the tradeoff between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (""arms""), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now wellunderstood, a lot of recent work has focused on multiarmed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit  it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a treebased classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem.
Author Information
Alex Slivkins Slivkins (Microsoft Research)
More from the Same Authors

2020 Poster: Efficient Contextual Bandits with Continuous Actions »
Maryam Majzoubi · Chicheng Zhang · Rajan Chari · Akshay Krishnamurthy · John Langford · Aleksandrs Slivkins 
2020 Poster: Constrained episodic reinforcement learning in concaveconvex and knapsack settings »
KiantÃ© Brantley · Miro Dudik · Thodoris Lykouris · Sobhan Miryoosefi · Max Simchowitz · Aleksandrs Slivkins · Wen Sun 
2009 Poster: Adapting to the Shifting Intent of Search Queries »
Umar Syed · Aleksandrs Slivkins · Nina Mishra