Timezone: »

Threshold Bandits, With and Without Censored Feedback
Jacob D Abernethy · Kareem Amin · Ruihao Zhu

Mon Dec 05 09:00 AM -- 12:30 PM (PST) @ Area 5+6+7+8 #19
We consider the \emph{Threshold Bandit} setting, a variant of the classical multi-armed bandit problem in which the reward on each round depends on a piece of side information known as a \emph{threshold value}. The learner selects one of $K$ actions (arms), this action generates a random sample from a fixed distribution, and the action then receives a unit payoff in the event that this sample exceeds the threshold value. We consider two versions of this problem, the \emph{uncensored} and \emph{censored} case, that determine whether the sample is always observed or only when the threshold is not met. Using new tools to understand the popular UCB algorithm, we show that the uncensored case is essentially no more difficult than the classical multi-armed bandit setting. Finally we show that the censored case exhibits more challenges, but we give guarantees in the event that the sequence of threshold values is generated optimistically.

Author Information

Jacob D Abernethy (University of Michigan)
Kareem Amin (University of Michigan)
Ruihao Zhu (MIT)

More from the Same Authors