NeurIPS Poster Fast Rates for Bandit PAC Multiclass Classification

Poster

Fast Rates for Bandit PAC Multiclass Classification

Liad Erez · Alon Peled-Cohen · Tomer Koren · Yishay Mansour · Shay Moran

West Ballroom A-D #5605

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: We study multiclass PAC learning with bandit feedback, where inputs are classified into one of

K

$K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic

(ε, δ)

$(\varepsilon,\delta)$ -PAC version of the problem, with sample complexity of

O ((poly (K) + 1 / ε^{2}) \log (| H | / δ))

$O\big( (\operatorname{poly}(K) + 1 / \varepsilon^2) \log (|\mathcal{H}| / \delta) \big)$ for any finite hypothesis class

H

$\mathcal{H}$ . In terms of the leading dependence on

ε

$\varepsilon$ , this improves upon existing bounds for the problem, that are of the form

O (K / ε^{2})

$O(K/\varepsilon^2)$ . We also provide an extension of this result to general classes and establish similar sample complexity bounds in which

\log | H |

$\log |\mathcal{H}|$ is replaced by the Natarajan dimension.This matches the optimal rate in the full-information version of the problem and resolves an open question studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011) who demonstrated that the multiplicative price of bandit feedback in realizable PAC learning is

Θ (K)

$\Theta(K)$ . We complement this by revealing a stark contrast with the agnostic case, where the price of bandit feedback is only

O (1)

$O(1)$ as

ε \to 0

$\varepsilon \to 0$ . Our algorithm utilizes a stochastic optimization technique to minimize a log-barrier potential based on Frank-Wolfe updates for computing a low-variance exploration distribution over the hypotheses, and is made computationally efficient provided access to an ERM oracle over

H

$\mathcal{H}$ .

Chat is not available.