Track: Orals & Spotlights Track 11: Learning Theory

Tue 8 Dec. 6:00 - 6:15 PST

Oral

Outstanding Paper

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Andrea Celli · Alberto Marchesi · Gabriele Farina · Nicola Gatti

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is a close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.

Tue 8 Dec. 6:15 - 6:30 PST

Oral

Efficient active learning of sparse halfspaces with arbitrary bounded noise

Chicheng Zhang · Jie Shen · Pranjal Awasthi

We study active learning of homogeneous $s$ -sparse halfspaces in $\mathbb{R}^d$ under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most $\eta$ for a parameter $\eta \in \big[0, \frac12\big)$ , known as the bounded noise. Even in the presence of mild label noise, i.e. $\eta$ is a small constant, this is a challenging problem and only recently have label complexity bounds of the form $\tilde{O}(s \cdot polylog(d, \frac{1}{\epsilon}))$ been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity $\tilde{O}((s ln d/\epsilon)^{poly(1/(1-2\eta))})$ , which is label-efficient only when the noise rate $\eta$ is a fixed constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of $s$ -sparse halfspaces, with a label complexity of $\tilde{O}\big(\frac{s}{(1-2\eta)^4} polylog (d, \frac 1 \epsilon) \big)$ . This is the first efficient algorithm with label complexity polynomial in $\frac{1}{1-2\eta}$ in this setting, which is label-efficient even for $\eta$ arbitrarily close to $\frac12$ . Our active learning algorithm and its theoretical guarantees also immediately translate to new state-of-the-art label and sample complexity results for full-dimensional active and passive halfspace learning under arbitrary bounded noise.

Tue 8 Dec. 6:30 - 6:45 PST

Oral

Learning Parities with Neural Networks

Amit Daniely · Eran Malach

In recent years we see a rapidly growing line of research which shows learnability of various models via common neural network algorithms. Yet, besides a very few outliers, these results show learnability of models that can be learned using linear methods. Namely, such results show that learning neural-networks with gradient-descent is competitive with learning a linear classifier on top of a data-independent representation of the examples. This leaves much to be desired, as neural networks are far more successful than linear methods. Furthermore, on the more conceptual level, linear models don't seem to capture thedeepness" of deep networks. In this paper we make a step towards showing leanability of models that are inherently non-linear. We show that under certain distributions, sparse parities are learnable via gradient decent on depth-two network. On the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods.

Tue 8 Dec. 6:45 - 7:00 PST

Break

Tue 8 Dec. 7:00 - 7:10 PST

Spotlight

The Adaptive Complexity of Maximizing a Gross Substitutes Valuation

Ron Kupfer · Sharon Qian · Eric Balkanski · Yaron Singer

In this paper, we study the adaptive complexity of maximizing a monotone gross substitutes function under a cardinality constraint. Our main result is an algorithm that achieves a 1-epsilon approximation in O(log n) adaptive rounds for any constant epsilon > 0, which is an exponential speedup in parallel running time compared to previously studied algorithms for gross substitutes functions. We show that the algorithmic results are tight in the sense that there is no algorithm that obtains a constant factor approximation in o(log n) rounds. Both the upper and lower bounds are under the assumption that queries are only on feasible sets (i.e., of size at most k). We also show that under a stronger model, where non-feasible queries are allowed, there is no non-adaptive algorithm that obtains an approximation better than 1/2 + epsilon. Both lower bounds extend to the class of OXS functions. Additionally, we conduct experiments on synthetic and real data sets to demonstrate the near-optimal performance and efficiency of the algorithm in practice.

Tue 8 Dec. 7:10 - 7:20 PST

Spotlight

Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics

Aranyak Mehta · Uri Nadav · Alexandros Psomas · Aviad Rubinstein

We consider the fundamental problem of selecting $k$ out of $n$ random variables in a way that the expected highest or second-highest value is maximized. This question captures several applications where we have uncertainty about the quality of candidates (e.g. auction bids, search results) and have the capacity to explore only a small subset due to an exogenous constraint. For example, consider a second price auction where system constraints (e.g., costly retrieval or model computation) allow the participation of only $k$ out of $n$ bidders, and the goal is to optimize the expected efficiency (highest bid) or expected revenue (second highest bid). We study the case where we are given an explicit description of each random variable. We give a PTAS for the problem of maximizing the expected highest value. For the second-highest value, we prove a hardness result: assuming the Planted Clique Hypothesis, there is no constant factor approximation algorithm that runs in polynomial time. Surprisingly, under the assumption that each random variable has monotone hazard rate (MHR), a simple score-based algorithm, namely picking the $k$ random variables with the largest $1/\sqrt{k}$ top quantile value, is a constant approximation to the expected highest and second highest value, \emph{simultaneously}.

Tue 8 Dec. 7:20 - 7:30 PST

Spotlight

A Bandit Learning Algorithm and Applications to Auction Design

Kim Thang Nguyen

We consider online bandit learning in which at every time step, an algorithm has to make a decision and then observe only its reward. The goal is to design efficient (polynomial-time) algorithms that achieve a total reward approximately close to that of the best fixed decision in hindsight. In this paper, we introduce a new notion of $(\lambda,\mu)$ -concave functions and present a bandit learning algorithm that achieves a performance guarantee which is characterized as a function of the concavity parameters $\lambda$ and $\mu$ . The algorithm is based on the mirror descent algorithm in which the update directions follow the gradient of the multilinear extensions of the reward functions. The regret bound induced by our algorithm is $\widetilde{O}(\sqrt{T})$ which is nearly optimal. We apply our algorithm to auction design, specifically to welfare maximization, revenue maximization, and no-envy learning in auctions. In welfare maximization, we show that a version of fictitious play in smooth auctions guarantees a competitive regret bound which is determined by the smooth parameters. In revenue maximization, we consider the simultaneous second-price auctions with reserve prices in multi-parameter environments. We give a bandit algorithm which achieves the total revenue at least $1/2$ times that of the best fixed reserve prices in hindsight. In no-envy learning, we study the bandit item selection problem where the player valuation is submodular and provide an efficient $1/2$ -approximation no-envy algorithm.

Tue 8 Dec. 7:30 - 7:40 PST

Spotlight

An Optimal Elimination Algorithm for Learning a Best Arm

Avinatan Hassidim · Ron Kupfer · Yaron Singer

We consider the classic problem of $(\epsilon,\delta)$ -\texttt{PAC} learning a best arm where the goal is to identify with confidence $1-\delta$ an arm whose mean is an $\epsilon$ -approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst case sample complexity is not well understood. In this paper we propose a new approach for $(\epsilon,\delta)$ -\texttt{PAC} learning a best arm. This approach leads to an algorithm whose sample complexity converges to \emph{exactly} the optimal sample complexity of $(\epsilon,\delta)$ -learning the mean of $n$ arms separately and we complement this result with a conditional matching lower bound. More specifically: $\begin{itemize} \item The algorithm's sample complexity converges to \emph{exactly} $\frac{n}{2\epsilon^2}\log \frac{1}{\delta}$ as $n$ grows and $\delta \geq \frac{1}{n}$; % \item We prove that no elimination algorithm obtains sample complexity arbitrarily lower than $\frac{n}{2\epsilon^2}\log \frac{1}{\delta}$. Elimination algorithms is a broad class of $(\epsilon,\delta)$-\texttt{PAC} best arm learning algorithms that includes many algorithms in the literature. \end{itemize}$ When $n$ is independent of $\delta$ our approach yields an algorithm whose sample complexity converges to $\frac{2n}{\epsilon^2} \log \frac{1}{\delta}$ as $n$ grows. In comparison with the best known algorithm for this problem our approach improves the sample complexity by a factor of over 1500 and over 6000 when $\delta\geq \frac{1}{n}$ .

Tue 8 Dec. 7:40 - 7:50 PST

Q&A

Joint Q&A for Preceeding Spotlights

Tue 8 Dec. 7:50 - 8:00 PST

Spotlight

Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

Andres Masegosa · Stephan Lorenzen · Christian Igel · Yevgeny Seldin

We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble.

Tue 8 Dec. 8:00 - 8:10 PST

Spotlight

PAC-Bayesian Bound for the Conditional Value at Risk

Zakaria Mhammedi · Benjamin Guedj · Robert Williamson

Conditional Value at Risk (CVaR) is a 'coherent risk measure' which generalizes expectation (reduced to a boundary parameter setting). Widely used in mathematical finance, it is garnering increasing interest in machine learning as an alternate approach to regularization, and as a means for ensuring fairness.
This paper presents a generalization bound for learning algorithms that minimize the CVaR of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical CVaR is small. We achieve this by reducing the problem of estimating CVaR to that of merely estimating an expectation. This then enables us, as a by-product, to obtain concentration inequalities for CVaR even when the random variable in question is unbounded.

Tue 8 Dec. 8:10 - 8:20 PST

Spotlight

Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability

Sitan Chen · Frederic Koehler · Ankur Moitra · Morris Yau

In this paper, we revisit the problem of distribution-independently learning halfspaces under Massart noise with rate $\eta$ . Recent work resolved a long-standing problem in this model of efficiently learning to error $\eta + \epsilon$ for any $\epsilon > 0$ , by giving an improper learner that partitions space into $\text{poly}(d,1/\epsilon)$ regions. Here we give a much simpler algorithm and settle a number of outstanding open questions: (1) We give the first \emph{proper} learner for Massart halfspaces that achieves $\eta + \epsilon$ . (2) Based on (1), we develop a blackbox knowledge distillation procedure to convert an arbitrarily complex classifier to an equally good proper classifier. (3) By leveraging a simple but overlooked connection to \emph{evolvability}, we show any SQ algorithm requires super-polynomially many queries to achieve $\mathsf{OPT} + \epsilon$ . We then zoom out to study generalized linear models and give an efficient algorithm for learning under a challenging new corruption model generalizing Massart noise. Finally we study our algorithm for learning halfspaces under Massart noise empirically and find that it exhibits some appealing fairness properties as a byproduct of its strong provable robustness guarantees.

Tue 8 Dec. 8:20 - 8:30 PST

Spotlight

Hedging in games: Faster convergence of external and swap regrets

Xi Chen · Binghui Peng

We consider the setting where players run the Hedge algorithm or its optimistic variant \cite{syrgkanis2015fast} to play an n-action game repeatedly for T rounds. 1) For two-player games, we show that the regret of optimistic Hedge decays at \tilde{O}( 1/T ^{5/6} ), improving the previous bound O(1/T^{3/4}) by \cite{syrgkanis2015fast}. 2) In contrast, we show that the convergence rate of vanilla Hedge is no better than \tilde{\Omega}(1/ \sqrt{T})}, addressing an open question posted in \cite{syrgkanis2015fast}. For general m-player games, we show that the swap regret of each player decays at rate \tilde{O}(m^{1/2} (n/T)^{3/4}) when they combine optimistic Hedge with the classical external-to-internal reduction of Blum and Mansour \cite{blum2007external}. The algorithm can also be modified to achieve the same rate against itself and a rate of \tilde{O}(\sqrt{n/T}) against adversaries. Via standard connections, our upper bounds also imply faster convergence to coarse correlated equilibria in two-player games and to correlated equilibria in multiplayer games.

Tue 8 Dec. 8:30 - 8:40 PST

Spotlight

Online Bayesian Persuasion

Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Nicola Gatti

In Bayesian persuasion, an informed sender has to design a signaling scheme that discloses the right amount of information so as to influence the behavior of a self-interested receiver. This kind of strategic interaction is ubiquitous in real economic scenarios. However, the original model by Kamenica and Gentzkow makes some stringent assumptions which limit its applicability in practice. One of the most limiting assumptions is arguably that, in order to compute an optimal signaling scheme, the sender is usually required to know the receiver's utility function. In this paper, we relax this assumption through an online learning framework in which the sender faces a receiver with unknown type. At each round, the receiver's type is chosen adversarially from a finite set of possible types. We are interested in no-regret algorithms prescribing a signaling scheme at each round of the repeated interaction with performances close to that of the best-in-hindsight signaling scheme. First, we prove a hardness result on the per-iteration running time required to achieve the no-regret property. Then, we provide algorithms for the full and partial information model which exhibit regret sublinear in the number of rounds and polynomial in the parameters of the game.

Tue 8 Dec. 8:40 - 8:50 PST

Q&A

Joint Q&A for Preceeding Spotlights

Tue 8 Dec. 8:50 - 9:00 PST

Break

Main Navigation

Session

Orals & Spotlights Track 11: Learning Theory

Dylan Foster · Nicolò Cesa-Bianchi

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Efficient active learning of sparse halfspaces with arbitrary bounded noise

Learning Parities with Neural Networks

Break

The Adaptive Complexity of Maximizing a Gross Substitutes Valuation

Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics

A Bandit Learning Algorithm and Applications to Auction Design

An Optimal Elimination Algorithm for Learning a Best Arm

Joint Q&A for Preceeding Spotlights

Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

PAC-Bayesian Bound for the Conditional Value at Risk

Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability

Hedging in games: Faster convergence of external and swap regrets

Online Bayesian Persuasion

Joint Q&A for Preceeding Spotlights

Break