Poster
On Differentially Private U Statistics
Kamalika Chaudhuri · Po-Ling Loh · Shourya Pandey · Purnamrita Sarkar
West Ballroom A-D #6309
[
Abstract
]
Wed 11 Dec 4:30 p.m. PST
— 7:30 p.m. PST
Abstract:
In this paper, we consider the problem of private estimation of estimable parameters of the form $\mathbb{E}[h(X_1,\dots,X_k)]$ where $X_1,\dots, X_n$ are i.i.d. data from some distribution. Without privacy concerns, this is estimated via U statistics, which are widely used estimators that arise in a broad class of problems, from nonparametric signed rank tests, symmetry testing, and uniformity testing to subgraph counts in random networks. They are simply averages of an appropriate kernel applied to all subsets of a given size (also known as the degree) of a given sample. Despite the recent outpouring of interest in private mean estimation, private algorithms for general U statistics have received little attention. We show that while existing private mean estimation algorithms can be applied to obtain confidence intervals, they typically lead to a suboptimal private error. While the variance of non-degenerate U statistics is typically $O(1/n)$, in many hypothesis testing examples, under the null, the corresponding U statistic becomes degenerate with variance $O(1/n^2)$. In specific cases where the U statistics are degenerate or have vanishing moments, the private error of simple extensions of existing algorithms may even be of a larger order than the non-private error. To remedy this, we propose a new thresholding-based approach using what we call local Hájek projections to reweight different subsets. This leads to a nearly optimal private error in the non-degenerate setting and a strong indication of near-optimality in the degenerate setting.
Live content is unavailable. Log in and register to view live content