Skip to yearly menu bar Skip to main content

Workshop: Synthetic Data for Empowering ML Research

Synthetic Clinical Trial Data while Preserving Subject-Level Privacy

Mandis Beigi · Afrah Shafquat · Jason Mezey · Jacob Aptekar


Clinical trials capture high-quality data for millions of patients each year, yet these data are largely unavailable for research beyond the scope of any individual trial due to a combination of regulatory, intellectual property, and patient privacy barriers. Synthetic clinical trial data that captures the analytical properties of the source data, could provide significant value for research and drug development by making insights widely available while protecting the privacy of the participants. We present a method for generating research-grade synthetic clinical trial data from a real data source. We compared the fidelity and privacy preservation performance of our method to the state-of-the-art deep learning synthesizers and found that our synthesizer had superior performance when applied to clinical trial data as assessed both by established metrics and when considering critical clinical features. We also demonstrate how the privacy settings may be configured to conform to specific privacy policies governing data sharing.

Chat is not available.