Timezone: »
Poster
Sequential Experimental Design for Transductive Linear Bandits
Lalit Jain · Kevin Jamieson · Tanner Fiez · Lillian Ratliff
Wed Dec 11 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #10
In this paper we introduce the pure exploration transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in \mathbb{R}^d$, the goal is to infer $\arg\max_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x^\top\theta^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\}^d$, combinatorial bandits. The transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we present the first non-asymptotic algorithm for linear bandits that nearly achieves the information-theoretic lower bound.
Author Information
Lalit Jain (University of Washington)
Kevin Jamieson (U Washington)
Tanner Fiez (University of Washington)
Lillian Ratliff (University of Washington)
More from the Same Authors
-
2022 Poster: Active Learning with Safety Constraints »
Romain Camilleri · Andrew Wagenmaker · Jamie Morgenstern · Lalit Jain · Kevin Jamieson -
2022 Poster: Instance-optimal PAC Algorithms for Contextual Bandits »
Zhaoqi Li · Lillian Ratliff · houssam nassif · Kevin Jamieson · Lalit Jain -
2022 Poster: Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design »
Andrew Wagenmaker · Kevin Jamieson -
2021 : Beyond No Regret: Instance-Dependent PAC Reinforcement Learning »
Andrew Wagenmaker · Kevin Jamieson -
2021 : Closing the loop in Machine Learning: Learning to optimize with decision dependent data »
Lillian Ratliff -
2021 Poster: Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games »
Tanner Fiez · Lillian Ratliff · Eric Mazumdar · Evan Faulkner · Adhyyan Narang -
2021 Poster: Selective Sampling for Online Best-arm Identification »
Romain Camilleri · Zhihan Xiong · Maryam Fazel · Lalit Jain · Kevin Jamieson -
2021 Poster: Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers »
Julian Katz-Samuels · Blake Mason · Kevin Jamieson · Rob Nowak -
2021 Poster: Online Learning in Periodic Zero-Sum Games »
Tanner Fiez · Ryann Sim · Stratis Skoulakis · Georgios Piliouras · Lillian Ratliff -
2021 Poster: Corruption Robust Active Learning »
Yifang Chen · Simon Du · Kevin Jamieson -
2020 Poster: An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits »
Julian Katz-Samuels · Lalit Jain · zohar karnin · Kevin Jamieson -
2019 : Contributed talk: Characterizing Equilibria in Stackelberg Games »
Tanner Fiez -
2019 Poster: A New Perspective on Pool-Based Active Classification and False-Discovery Control »
Lalit Jain · Kevin Jamieson -
2019 Poster: Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs »
Max Simchowitz · Kevin Jamieson -
2018 Poster: A Bandit Approach to Sequential Experimental Design with False Discovery Control »
Kevin Jamieson · Lalit Jain