Timezone: »
Poster
Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits
Sivan Sabato
Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #22
We study epsilon-best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an epsilon-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms.
Author Information
Sivan Sabato (Ben-Gurion University of the Negev)
More from the Same Authors
-
2021 Poster: A Constant Approximation Algorithm for Sequential Random-Order No-Substitution k-Median Clustering »
Tom Hess · Michal Moshkovitz · Sivan Sabato -
2018 Poster: Learning from discriminative feature feedback »
Sanjoy Dasgupta · Sivan Sabato · Nicholas Roberts · Akansha Dey -
2017 Poster: Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions »
Aryeh Kontorovich · Sivan Sabato · Roi Weiss -
2016 Poster: Active Nearest-Neighbor Learning in Metric Spaces »
Aryeh Kontorovich · Sivan Sabato · Ruth Urner -
2014 Poster: Active Regression by Stratification »
Sivan Sabato · Remi Munos -
2013 Poster: Auditing: Active Learning with Outcome-Dependent Query Costs »
Sivan Sabato · Anand D Sarwate · Nati Srebro -
2012 Poster: Multiclass Learning Approaches: A Theoretical Comparison with Implications »
Amit Daniely · Sivan Sabato · Shai Shalev-Shwartz -
2012 Spotlight: Multiclass Learning Approaches: A Theoretical Comparison with Implications »
Amit Daniely · Sivan Sabato · Shai Shalev-Shwartz -
2010 Poster: Tight Sample Complexity of Large-Margin Learning »
Sivan Sabato · Nati Srebro · Naftali Tishby