Timezone: »
Poster
Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games
Hedi Hadiji · Sarah Sachs · Tim van Erven · Wouter Koolen
In the first-order query model for zero-sum $K\times K$ matrix games, players observe the expected pay-offs for all their possible actions under the randomized action played by their opponent. This classical model has received renewed interest after the discovery by Rakhlin and Sridharan that $\epsilon$-approximate Nash equilibria can be computed efficiently from $O(\frac{\ln K}{\epsilon})$ instead of $O(\frac{\ln K}{\epsilon^2})$ queries. Surprisingly, the optimal number of such queries, as a function of both $\epsilon$ and $K$, is not known. We make progress on this question on two fronts. First, we fully characterise the query complexity of learning exact equilibria ($\epsilon=0$), by showing that they require a number of queries that is linear in $K$, which means that it is essentially as hard as querying the whole matrix, which can also be done with $K$ queries. Second, for $\epsilon > 0$, the current query complexity upper bound stands at $O(\min(\frac{\ln(K)}{\epsilon} , K))$. We argue that, unfortunately, obtaining a matching lower bound is not possible with existing techniques: we prove that no lower bound can be derived by constructing hard matrices whose entries take values in a known countable set, because such matrices can be fully identified by a single query. This rules out, for instance, reducing to an optimization problem over the hypercube by encoding it as a binary payoff matrix. We then introduce a new technique for lower bounds, which allows us to obtain lower bounds of order $\tilde\Omega(\log(\frac{1}{K\epsilon})$ for any $\epsilon \leq 1 / (cK^4)$, where $c$ is a constant independent of $K$. We further discuss possible future directions to improve on our techniques in order to close the gap with the upper bounds.
Author Information
Hedi Hadiji (L2S Univ. Paris-Saclay)
Sarah Sachs (University of Amsterdam)
Tim van Erven (University of Amsterdam)
Wouter Koolen (Centrum Wiskunde & Informatica, Amsterdam)
More from the Same Authors
-
2021 : Regret Minimization in Heavy-Tailed Bandits »
Shubhada Agrawal · Sandeep Juneja · Wouter Koolen -
2023 Poster: First- and Second-Order Bounds for Adversarial Linear Contextual Bandits »
Julia Olkhovskaya · Jack Mayo · Tim van Erven · Gergely Neu · Chen-Yu Wei -
2023 Poster: Adaptive Selective Sampling for Online Prediction with Experts »
Rui Castro · Fredrik Hellström · Tim van Erven -
2022 Poster: Luckiness in Multiscale Online Learning »
Wouter Koolen · Muriel F. Pérez-Ortiz -
2022 Poster: Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness »
Sarah Sachs · Hedi Hadiji · Tim van Erven · Cristóbal Guzmán -
2021 Poster: A/B/n Testing with Control in the Presence of Subpopulations »
Yoan Russac · Christina Katsimerou · Dennis Bohle · Olivier Cappé · Aurélien Garivier · Wouter Koolen -
2021 Poster: Optimal Best-Arm Identification Methods for Tail-Risk Measures »
Shubhada Agrawal · Wouter Koolen · Sandeep Juneja -
2019 Poster: Pure Exploration with Multiple Correct Answers »
Rémy Degenne · Wouter Koolen -
2019 Poster: Non-Asymptotic Pure Exploration by Solving Games »
Rémy Degenne · Wouter Koolen · Pierre Ménard -
2018 Poster: Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling »
Emilie Kaufmann · Wouter Koolen · Aurélien Garivier -
2017 Poster: Random Permutation Online Isotonic Regression »
Wojciech Kotlowski · Wouter Koolen · Alan Malek -
2017 Poster: Monte-Carlo Tree Search by Best Arm Identification »
Emilie Kaufmann · Wouter Koolen -
2017 Spotlight: Monte-Carlo Tree Search by Best Arm Identification »
Emilie Kaufmann · Wouter Koolen -
2016 Poster: Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning »
Wouter Koolen · Peter Grünwald · Tim van Erven -
2016 Poster: MetaGrad: Multiple Learning Rates in Online Learning »
Tim van Erven · Wouter Koolen -
2016 Oral: MetaGrad: Multiple Learning Rates in Online Learning »
Tim van Erven · Wouter Koolen -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Learning Faster from Easy Data II: Introduction »
Tim van Erven -
2015 Workshop: Learning Faster from Easy Data II »
Tim van Erven · Wouter Koolen -
2015 Poster: Minimax Time Series Prediction »
Wouter Koolen · Alan Malek · Peter Bartlett · Yasin Abbasi Yadkori -
2014 Poster: Efficient Minimax Strategies for Square Loss Games »
Wouter M Koolen · Alan Malek · Peter Bartlett -
2014 Poster: Learning the Learning Rate for Prediction with Expert Advice »
Wouter M Koolen · Tim van Erven · Peter Grünwald -
2013 Workshop: Learning Faster From Easy Data »
Peter Grünwald · Wouter M Koolen · Sasha Rakhlin · Nati Srebro · Alekh Agarwal · Karthik Sridharan · Tim van Erven · Sebastien Bubeck -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2013 Poster: The Pareto Regret Frontier »
Wouter M Koolen -
2012 Poster: Mixability in Statistical Learning »
Tim van Erven · Peter Grünwald · Mark Reid · Robert Williamson -
2012 Poster: Putting Bayes to sleep »
Wouter M Koolen · Dmitri Adamskiy · Manfred K. Warmuth -
2012 Spotlight: Putting Bayes to sleep »
Wouter M Koolen · Dmitri Adamskiy · Manfred K. Warmuth -
2011 Poster: Adaptive Hedge »
Tim van Erven · Peter Grünwald · Wouter M Koolen · Steven D Rooij -
2011 Poster: Learning Eigenvectors for Free »
Wouter M Koolen · Wojciech Kotlowski · Manfred K. Warmuth