NeurIPS Poster Private Distribution Learning with Public Data: The View from Sample Compression

Spotlight Poster

Private Distribution Learning with Public Data: The View from Sample Compression

Shai Ben-David · Alex Bie · Clément L Canonne · Gautam Kamath · Vikrant Singhal

Great Hall & Hall B1+B2 (level 1) #1611

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract: We study the problem of private distribution learning with access to public data. In this setup, which we refer to as *public-private learning*, the learner is given public and private samples drawn from an unknown distribution

p

$p$ belonging to a class

Q

$\mathcal Q$ , with the goal of outputting an estimate of

p

$p$ while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class

Q

$\mathcal Q$ is connected to the existence of a sample compression scheme for

Q

$\mathcal Q$ , as well as to an intermediate notion we refer to as \emph{list learning}. Leveraging this connection: (1) approximately recovers previous results on Gaussians over

R^{d}

$\mathbb R^d$ ; and (2) leads to new ones, including sample complexity upper bounds for arbitrary

k

$k$ -mixtures of Gaussians over

R^{d}

$\mathbb R^d$ , results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in

R^{d}

$\mathbb R^d$ , at least

d

$d$ public samples are necessary for private learnability, which is close to the known upper bound of

d + 1

$d+1$ public samples.

Chat is not available.