Personalizing Many Decisions with High-Dimensional Covariates
Nima Hamidi · Mohsen Bayati · Kapil Gupta

Tue Dec 10th 05:30 -- 07:30 PM @ East Exhibition Hall B + C #16

We consider the k-armed stochastic contextual bandit problem with d dimensional features, when both k and d can be large. To the best of our knowledge, all existing algorithm for this problem have a regret bound that scale as polynomials of degree at least two, in k and d. The main contribution of this paper is to introduce and theoretically analyze a new algorithm (REAL Bandit) with a regret that scales by r^2(k+d) when r is rank of the k by d matrix of unknown parameters. REAL Bandit relies on ideas from low-rank matrix estimation literature and a new row-enhancement subroutine that yields sharper bounds for estimating each row of the parameter matrix that may be of independent interest.

Author Information

Nima Hamidi (Stanford University)
Mohsen Bayati (Stanford University)
Kapil Gupta (Airbnb)

More from the Same Authors