Timezone: »

Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits
Sivan Sabato

Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #22

We study epsilon-best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an epsilon-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms.

Author Information

Sivan Sabato (Ben-Gurion University of the Negev)

More from the Same Authors