Timezone: »
Poster
Dueling Bandits with Team Comparisons
Lee Cohen · Ulrike Schmidt-Kraepelin · Yishay Mansour
We introduce the dueling teams problem, a new online-learning setting in which the learner observes noisy comparisons of disjoint pairs of $k$-sized teams from a universe of $n$ players. The goal of the learner is to minimize the number of duels required to identify, with high probability, a Condorcet winning team, i.e., a team which wins against any other disjoint team (with probability at least $1/2$).Noisy comparisons are linked to a total order on the teams. We formalize our model by building upon the dueling bandits setting (Yue et al. 2012) and provide several algorithms, both for stochastic and deterministic settings. For the stochastic setting, we provide a reduction to the classical dueling bandits setting, yielding an algorithm that identifies a Condorcet winning team within $\mathcal{O}((n + k \log (k)) \frac{\max(\log\log n, \log k)}{\Delta^2})$ duels, where $\Delta$ is a gap parameter. For deterministic feedback, we additionally present a gap-independent algorithm that identifies a Condorcet winning team within $\mathcal{O}(nk\log(k)+k^5)$ duels.
Author Information
Lee Cohen (Tel Aviv University)
Ulrike Schmidt-Kraepelin (TU Berlin)
Yishay Mansour (Tel Aviv University & Google)
More from the Same Authors
-
2021 Spotlight: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2021 Poster: Minimax Regret for Stochastic Shortest Path »
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg -
2021 Oral: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2021 Poster: Differentially Private Multi-Armed Bandits in the Shuffle Model »
Jay Tenenbaum · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2021 Poster: ROI Maximization in Stochastic Online Decision-Making »
Nicolò Cesa-Bianchi · Tom Cesari · Yishay Mansour · Vianney Perchet -
2021 Poster: Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations »
Ayush Sekhari · Christoph Dann · Mehryar Mohri · Yishay Mansour · Karthik Sridharan -
2017 Workshop: Learning in the Presence of Strategic Behavior »
Nika Haghtalab · Yishay Mansour · Tim Roughgarden · Vasilis Syrgkanis · Jennifer Wortman Vaughan -
2017 Poster: Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues »
Noga Alon · Moshe Babaioff · Yannai A. Gonczarowski · Yishay Mansour · Shay Moran · Amir Yehudayoff -
2017 Spotlight: Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues »
Noga Alon · Moshe Babaioff · Yannai A. Gonczarowski · Yishay Mansour · Shay Moran · Amir Yehudayoff -
2017 Poster: Multi-Armed Bandits with Metric Movement Costs »
Tomer Koren · Roi Livni · Yishay Mansour