NeurIPS Poster On Optimal Learning Under Targeted Data Poisoning

Poster

On Optimal Learning Under Targeted Data Poisoning

Steve Hanneke · Amin Karbasi · Mohammad Mahmoody · Idan Mehalel · Shay Moran

Hall J (level 1) #822

[ Abstract ]

[ Paper] [ OpenReview]

Abstract: Consider the task of learning a hypothesis class

$\mathcal{H}$ in the presence of an adversary that can replace up to an

$\eta$ fraction of the examples in the training set with arbitrary adversarial examples. The adversary aims to fail the learner on a particular target test point

$x$ which is \emph{known} to the adversary but not to the learner. In this work we aim to characterize the smallest achievable error

$\epsilon=\epsilon(\eta)$ by the learner in the presence of such an adversary in both realizable and agnostic settings. We fully achieve this in the realizable setting, proving that

$\epsilon=\Theta(\mathtt{VC}(\mathcal{H})\cdot \eta)$ , where

$\mathtt{VC}(\mathcal{H})$ is the VC dimension of

$\mathcal{H}$ . Remarkably, we show that the upper bound can be attained by a deterministic learner. In the agnostic setting we reveal a more elaborate landscape: we devise a deterministic learner with a multiplicative regret guarantee of

$\epsilon \leq C\cdot\mathtt{OPT} + O(\mathtt{VC}(\mathcal{H})\cdot \eta)$ , where

$C > 1$ is a universal numerical constant. We complement this by showing that for any deterministic learner there is an attack which worsens its error to at least

$2\cdot \mathtt{OPT}$ . This implies that a multiplicative deterioration in the regret is unavoidable in this case. Finally, the algorithms we develop for achieving the optimal rates are inherently improper. Nevertheless, we show that for a variety of natural concept classes, such as linear classifiers, it is possible to retain the dependence

$\epsilon=\Theta_{\mathcal{H}}(\eta)$ by a proper algorithm in the realizable setting. Here

$\Theta_{\mathcal{H}}$ conceals a polynomial dependence on

$\mathtt{VC}(\mathcal{H})$ .

Chat is not available.