Workshop: Synthetic Data for Empowering ML Research

Distributional Privacy for Data Sharing

Zinan Lin · Shuaiqi Wang · Vyas Sekar · Giulia Fanti


Data sharing between different parties has become an important engine powering modern research and development processes. An important class of privacy concerns in data sharing regards the underlying distribution of data. For example, the total traffic volume of data from a networking company reveals the scale of its business. Unfortunately, existing privacy frameworks do not adequately address this class of concerns. In this paper, we propose distributional privacy, a framework for analyzing and protecting these distributional privacy concerns in data sharing scenarios. Distributional privacy is applicable in multiple data sharing settings, including synthetic data release. Theoretically, we analyze the lower and upper bounds of privacy-distortion trade-offs. Practically, we propose data release mechanism for protecting distributional privacy concerns, and demonstrate that they achieve better privacy-distortion trade-offs than alternative privacy mechanisms on real-world datasets.

Chat is not available.