Poster
Fast Rates for Bandit PAC Multiclass Classification
Liad Erez · Alon Peled-Cohen · Tomer Koren · Yishay Mansour · Shay Moran
West Ballroom A-D #5605
Abstract:
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic -PAC version of the problem, with sample complexity of for any finite hypothesis class . In terms of the leading dependence on , this improves upon existing bounds for the problem, that are of the form . We also provide an extension of this result to general classes and establish similar sample complexity bounds in which is replaced by the Natarajan dimension.This matches the optimal rate in the full-information version of the problem and resolves an open question studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011) who demonstrated that the multiplicative price of bandit feedback in realizable PAC learning is . We complement this by revealing a stark contrast with the agnostic case, where the price of bandit feedback is only as . Our algorithm utilizes a stochastic optimization technique to minimize a log-barrier potential based on Frank-Wolfe updates for computing a low-variance exploration distribution over the hypotheses, and is made computationally efficient provided access to an ERM oracle over .
Chat is not available.