Workshop: Synthetic Data for Empowering ML Research

A source data privacy framework for synthetic clinical trial data

Afrah Shafquat · Jason Mezey · Mandis Beigi · Jimeng Sun · Jacob Aptekar

[ Abstract ] [ Website ]
[ Poster [ OpenReview
Fri 2 Dec 7:42 a.m. PST — 7:44 a.m. PST


Synthetic clinical trial data create opportunities for data sharing, cross-collaboration, and innovation for these valuable, siloed data sources. While the value of synthetic clinical trial data relies on the privacy preservation it offers the clinical trial participants, the true degree of privacy has been questioned in recent literature. Given the highly sensitive nature of clinical trial data, especially their content composing private health information, there is an urgent need for a framework specifically designed to provide guaranteed levels of privacy for synthetic datasets generated from clinical trial data. In this paper, we propose a practical privacy framework that ensures synthetic clinical trial data privacy at the level of the source data by design and provides objective, measurable bounds on the disclosure risks through a combination of technical, policy, and algorithmic controls. The proposed framework enforces privacy prior to the generation of synthetic datasets and therefore complements the privacy preserving attributes intrinsic to the algorithms used for synthetic data generation. To demonstrate how the components of the framework address the privacy requirements needed for clinical trial data, we discuss how this privacy system responds to a set of realistic adversarial scenarios. Ultimately, we believe the proposed framework can foster more privacy research in clinical trial data sharing.

Chat is not available.