High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
Abstract
Generative AI holds promise for overcoming data access restrictions that hinder cardiovascular research, but the real-world use of synthetic ECG models is limited by persistent gaps intrustworthiness and clinical utility. In this work, we address two key shortcomings of current methods: insufficient morphological fidelity and the inability to generate personalized, patient-specific signals. To close these gaps, we build on a state-of-the-art diffusion model with two principled innovations: (1) MIDT-ECG (Mel-Spectrogram Informed Diffusion Training), a frequency-domain supervision framework that enforces physiological realism, and (2) multimodal demographic conditioning to enable patient-specific synthesis. We comprehensively evaluate our approach on the PTB-XL dataset, measuring signal fidelity, clinical coherence, privacy risk, and downstream utility. Our results show that MIDT-ECG improves morphological coherence, reducing inter-lead correlation error by 70%, while demographic conditioning enhances signal-to-noise ratio and personalization. Critically, classifiers trained solely on synthetic ECGs achieve performance comparable to those trained on real data in low-data regimes. Together, these contributions deliver a scalable and privacy-preserving approach for generating trustworthy, personalized medical time series, advancing the responsible use of generative AI in healthcare.