Poster
in
Workshop: ML x OR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making

Contextual Value Iteration and Deep Approximation for Bayesian Contextual Bandits

Kevin Duijndam · Ger Koole · Rob van der Mei

Project Page [ OpenReview]

Abstract

We present a Bayesian value-iteration framework for contextual multi-armed bandit problems that treats the agent’s posterior distribution for the pay-off as the state of the Markov Decision Process. We apply finite-dimensional priors on the unknown reward parameters, and the exogenous context transition kernel. Value iteration on the belief-MDP yields an optimal policy. We illustrate the approach in an airline seat-pricing simulation. To address the curse of dimensionality, we approximate the value function with a dual-stream deep learning network and benchmark our deep value iteration algorithm on a standard contextual bandit instance.

Chat is not available.