NeurIPS Poster Generalized Linear Bandits with Limited Adaptivity

Spotlight Poster

Generalized Linear Bandits with Limited Adaptivity

Ayush Sawarni · Nirjhar Das · Siddharth Barman · Gaurav Sinha

West Ballroom A-D #5800

[ Abstract ]

[ Paper] [ Slides] [ OpenReview]

Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract: We study the generalized linear contextual bandit problem within the constraints of limited adaptivity. In this paper, we present two algorithms, B-GLinCB and RS-GLinCB, that address, respectively, two prevalent limited adaptivity settings. Given a budget

M

$M$ on the number of policy updates, in the first setting, the algorithm needs to decide upfront

M

$M$ rounds at which it will update its policy, while in the second setting it can adaptively perform

M

$M$ policy updates during its course. For the first setting, we design an algorithm B-GLinCB, that incurs

\tilde{O} (\sqrt{T})

$\tilde{O}(\sqrt{T})$ regret when

M = Ω (\log \log T)

$M = \Omega( \log{\log T} )$ and the arm feature vectors are generated stochastically. For the second setting, we design an algorithm RS-GLinCB that updates its policy

\tilde{O} (\log^{2} T)

$\tilde{O}(\log^2 T)$ times and achieves a regret of

\tilde{O} (\sqrt{T})

$\tilde{O}(\sqrt{T})$ even when the arm feature vectors are adversarially generated. Notably, in these bounds, we manage to eliminate the dependence on a key instance dependent parameter

κ

$\kappa$ , that captures non-linearity of the underlying reward model. Our novel approach for removing this dependence for generalized linear contextual bandits might be of independent interest.

Chat is not available.