Skip to yearly menu bar Skip to main content

Workshop: Synthetic Data for Empowering ML Research

Noise-Aware Statistical Inference with Differentially Private Synthetic Data

Ossi Räisä · Joonas Jälkö · Antti Honkela · Samuel Kaski


Existing work has shown that analysing differentially private (DP) synthetic data as if it were real does not produce valid uncertainty estimates. We tackle this problem by combining synthetic data analysis techniques from the field of multiple imputation (MI), and synthetic data generation using a novel noise-aware (NA) synthetic data generation algorithm NAPSU-MQ into a pipeline NA+MI that allows computing accurate uncertainty estimates for population-level quantities from DP synthetic data. Our experiments demonstrate that the pipeline is able to produce accurate confidence intervals from DP synthetic data.

Chat is not available.