Poster

Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits

Sivan Sabato

Keywords: Learning Theory Theory Bandit Algorithms Algorithms

2019 Poster

[ Paper] [ Poster]

Abstract

We study epsilon-best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an epsilon-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms.

Chat is not available.