Skip to yearly menu bar Skip to main content


Poster

Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits

Sivan Sabato

East Exhibition Hall B, C #22

Keywords: [ Learning Theory ] [ Theory ] [ Bandit Algorithms ] [ Algorithms ]


Abstract:

We study epsilon-best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an epsilon-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms.

Live content is unavailable. Log in and register to view live content