Timezone: »

Learning in Generalized Linear Contextual Bandits with Stochastic Delays
Zhengyuan Zhou · Renyuan Xu · Jose Blanchet

Wed Dec 11 04:45 PM -- 04:50 PM (PST) @ West Exhibition Hall A

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision maker only after some delay, which is unknown and stochastic, even though a decision must be made at each time step for an incoming set of contexts. We study the performance of upper confidence bound (UCB) based algorithms adapted to this delayed setting. In particular, we design a delay-adaptive algorithm, which we call Delayed UCB, for generalized linear contextual bandits using UCB-style exploration and establish regret bounds under various delay assumptions. In the important special case of linear contextual bandits, we further modify this algorithm and establish a tighter regret bound under the same delay assumptions. Our results contribute to the broad landscape of contextual bandits literature by establishing that UCB algorithms, which are widely deployed in modern recommendation engines, can be made robust to delays.

Author Information

Zhengyuan Zhou (Stanford University)
Renyuan Xu (University of Oxford)

Renyuan Xu is currently a Hooke Research Fellow in the Mathematical Institute at the University of Oxford. Prior to that, she obtained her Ph.D. degree in the IEOR Department at UC Berkeley in 2019 and the Bachelor's degree in Mathematics from the University of Science and Technology of China in 2014.

Jose Blanchet (Stanford University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors