Active Learning for Stochastic Contextual Linear Bandits
Emma Brunskill · Ishani Karmarkar · Zhaoqi Li
Abstract
A key goal in stochastic contextual linear bandits is to efficiently learn a near-optimal policy. Prior algorithms for this problem learn a policy by strategically sampling actions and naively (passively) sampling contexts from the underlying context distribution. However, in many practical scenarios---including online content recommendation, survey research, and clinical trials---practitioners can actively sample or recruit contexts based on prior knowledge of the context distribution. Despite this potential for _active learning_, the role of strategic context sampling in stochastic contextual linear bandits is underexplored. We propose an algorithm that learns a near-optimal policy by strategically sampling rewards of context-action pairs. We prove _instance-dependent_ theoretical guarantees demonstrating that our active context sampling strategy can improve over the minimax rate by up to a factor of $\sqrt{d}$, where $d$ is the linear dimension. We also show empirically that our algorithm reduces the number of samples needed to learn a near-optimal policy, in tasks such as warfarin dose prediction and joke recommendation.
Chat is not available.
Successful Page Load