Timezone: »

Your Bandit Model is Not Perfect: Introducing Robustness to Restless Bandits Enabled by Deep Reinforcement Learning
Jackson Killian · Lily Xu · Arpita Biswas · Milind Tambe

Restless multi-arm bandits (RMABs) are receiving renewed attention for their potential to model real-world planning problems under resource constraints. However, few RMAB models have surpassed theoretical interest, since they make the limiting assumption that model parameters are perfectly known. In the real world, model parameters often must be estimated via historical data or expert input, introducing uncertainty. In this light, we introduce a new paradigm, \emph{Robust RMABs}, a challenging generalization of RMABs that incorporates interval uncertainty over parameters of the dynamic model of each arm. This uncovers several new challenges for RMABs and inspires new algorithmic techniques of general interest. Our contributions are: (i)~We introduce the Robust Restless Bandit problem with interval uncertainty and solve a minimax regret objective; (ii)~We tackle the complexity of the robust objective via a double oracle (DO) approach and analyze its convergence; (iii)~To enable our DO approach, we introduce RMABPPO, a novel deep reinforcement learning (RL) algorithm for solving RMABs, of potential general interest.; (iv)~We design the first adversary algorithm for RMABs, required to implement the notoriously difficult minimax regret adversary oracle and also of general interest, by formulating it as a multi-agent RL problem and solving with a multi-agent extension of RMABPPO.

Author Information

Jackson Killian (Harvard University)
Lily Xu (Harvard University)
Arpita Biswas (Harvard University)
Milind Tambe (Harvard University/Google Research India)

More from the Same Authors