Poster
Bilevel Optimization with Lower-Level Contextual MDPs
Vinzenz Thoma · Barna Pásztor · Andreas Krause · Giorgia Ramponi · Yifan Hu
West Ballroom A-D #6505
In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization on Contextual Markov Decision Processes (BO-CMDP), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). BO-CMDP can be viewed as a Stackelberg Game where the leader and a random context beyond the leader’s control together decide the setup of an MDP while (potentially many) followers best respond to this setup. This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as model design for MDPs, economics, meta reinforcement learning, and dynamic mechanism design. We propose the Hyper Policy Gradient Descent (HPGD) algorithm to solve BO-CMDP, and demonstrate its convergence. Notably, HPGD only utilizes observations of the followers’ trajectories. Therefore, it allows the follower to use any training procedure and the leader to be agnostic of the specific algorithm used, which aligns with various real-world scenarios. We empirically demonstrate the performance of our algorithm.
Live content is unavailable. Log in and register to view live content