Timezone: »
Restless multi-arm bandits (RMABs) are receiving renewed attention for their potential to model real-world planning problems under resource constraints. However, few RMAB models have surpassed theoretical interest, since they make the limiting assumption that model parameters are perfectly known. In the real world, model parameters often must be estimated via historical data or expert input, introducing uncertainty. In this light, we introduce a new paradigm, \emph{Robust RMABs}, a challenging generalization of RMABs that incorporates interval uncertainty over parameters of the dynamic model of each arm. This uncovers several new challenges for RMABs and inspires new algorithmic techniques of general interest. Our contributions are: (i)~We introduce the Robust Restless Bandit problem with interval uncertainty and solve a minimax regret objective; (ii)~We tackle the complexity of the robust objective via a double oracle (DO) approach and analyze its convergence; (iii)~To enable our DO approach, we introduce RMABPPO, a novel deep reinforcement learning (RL) algorithm for solving RMABs, of potential general interest.; (iv)~We design the first adversary algorithm for RMABs, required to implement the notoriously difficult minimax regret adversary oracle and also of general interest, by formulating it as a multi-agent RL problem and solving with a multi-agent extension of RMABPPO.
Author Information
Jackson Killian (Harvard University)
Lily Xu (Harvard University)
Arpita Biswas (Harvard University)
Milind Tambe (Harvard University/Google Research India)
More from the Same Authors
-
2021 Spotlight: Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning »
Kai Wang · Sanket Shah · Haipeng Chen · Andrew Perrault · Finale Doshi-Velez · Milind Tambe -
2021 Meetup: Global »
Lily Xu -
2021 : Your Bandit Model is Not Perfect: Introducing Robustness to Restless Bandits Enabled by Deep Reinforcement Learning »
Jackson Killian -
2021 : Invite Talk Q&A »
Milind Tambe · Tejumade Afonja · Paula Rodriguez Diaz -
2021 : Invited Talk: AI for Social Impact: Results from Deployments for Public Health »
Milind Tambe -
2021 Poster: Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning »
Kai Wang · Sanket Shah · Haipeng Chen · Andrew Perrault · Finale Doshi-Velez · Milind Tambe -
2020 : Q/A and Panel Discussion for People-Earth with Dan Kammen and Milind Tambe »
Daniel Kammen · Milind Tambe · Giulio De Leo · Mayur Mudigonda · Surya Karthik Mukkavilli -
2020 : Q/A and Discussion »
Surya Karthik Mukkavilli · Mayur Mudigonda · Milind Tambe -
2020 : Milind Tambe »
Milind Tambe -
2020 Poster: Automatically Learning Compact Quality-aware Surrogates for Optimization Problems »
Kai Wang · Bryan Wilder · Andrew Perrault · Milind Tambe -
2020 Spotlight: Automatically Learning Compact Quality-aware Surrogates for Optimization Problems »
Kai Wang · Bryan Wilder · Andrew Perrault · Milind Tambe -
2020 Poster: Collapsing Bandits and Their Application to Public Health Intervention »
Aditya Mate · Jackson Killian · Haifeng Xu · Andrew Perrault · Milind Tambe