Timezone: »

Bandits with Knapsacks beyond the Worst Case
Karthik Abinav Sankararaman · Aleksandrs Slivkins

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @

Bandits with Knapsacks (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates.Second, we consider "simple regret" in BwK, which tracks algorithm's performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a "generalreduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits. Our results build on the BwK algorithm from prior work, providing new analyses thereof.

Author Information

Karthik Abinav Sankararaman (University of Maryland)

PhD student in UMD intereseted broadly in the intersection of machine learning, operations research and theoretical computer science.

Aleksandrs Slivkins (Microsoft Research NYC)

More from the Same Authors