Skip to yearly menu bar Skip to main content


Poster

Private and Personalized Frequency Estimation in a Federated Setting

Amrith Setlur · Vitaly Feldman · Kunal Talwar

[ ]
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Motivated by the problem of next word prediction on user devices we introduce and study the problem of personalized frequency histogram estimation in a federated setting. In this problem, each user observes a number of samples from a distribution over some domain which is specific to that user. The goal is to compute for all users a personalized estimate of the user's distribution with error measured in KL divergence. We focus on addressing two central challenges: statistical heterogeneity and protection of user privacy.Our approach to the problem relies on discovering and exploiting similar subpopulations of users which are often present in real-world data. To guide the intuition and design of algorithms for this problem we propose a simple data model which is based on a mixture of Dirichlet distributions. We design a (non-private) clustering-based algorithm for the problem and demonstrate several of its properties formally in our model. We also give a joint differentially private version of our algorithm along with a private data-dependent initialization scheme. Finally, we provide an extensive empirical evaluation of our private and non-private algorithms under varying levels of statistical and size heterogeneity on the Reddit, StackOverflow, and Amazon Reviews datasets. Our results demonstrate significant improvements over standard and clustering-based baselines, and in particular, they show that it is possible to improve over direct personalization of a single global model.

Live content is unavailable. Log in and register to view live content