Timezone: »
Bandits with Knapsacks (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates.Second, we consider "simple regret" in BwK, which tracks algorithm's performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a "general
reduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits. Our results build on the BwK algorithm from prior work, providing new analyses thereof.
Author Information
Karthik Abinav Sankararaman (University of Maryland)
PhD student in UMD intereseted broadly in the intersection of machine learning, operations research and theoretical computer science.
Aleksandrs Slivkins (Microsoft Research NYC)
More from the Same Authors
-
2021 : Exploration and Incentives in Reinforcement Learning »
Max Simchowitz · Aleksandrs Slivkins -
2021 : The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity »
Mark Sellke · Aleksandrs Slivkins -
2021 : Exploration and Incentives in Reinforcement Learning »
Max Simchowitz · Aleksandrs Slivkins -
2021 : The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity »
Mark Sellke · Aleksandrs Slivkins -
2022 Poster: Incentivizing Combinatorial Bandit Exploration »
Xinyan Hu · Dung Ngo · Aleksandrs Slivkins · Steven Wu -
2021 : Spotlight 1: Exploration and Incentives in Reinforcement Learning »
Max Simchowitz · Aleksandrs Slivkins -
2020 Poster: Efficient Contextual Bandits with Continuous Actions »
Maryam Majzoubi · Chicheng Zhang · Rajan Chari · Akshay Krishnamurthy · John Langford · Aleksandrs Slivkins -
2020 Poster: Constrained episodic reinforcement learning in concave-convex and knapsack settings »
Kianté Brantley · Miro Dudik · Thodoris Lykouris · Sobhan Miryoosefi · Max Simchowitz · Aleksandrs Slivkins · Wen Sun -
2017 : The Unfair Externalities of Exploration »
Aleksandrs Slivkins · Jennifer Wortman Vaughan -
2011 Poster: Multi-armed bandits on implicit metric spaces »
Aleksandrs Slivkins -
2009 Poster: Adapting to the Shifting Intent of Search Queries »
Umar Syed · Aleksandrs Slivkins · Nina Mishra