Skip to yearly menu bar Skip to main content

Workshop: Synthetic Data Generation with Generative AI

Continuous Diffusion for Mixed-Type Tabular Data

Markus Mueller · Kathrin Gruber · Dennis Fok

Keywords: [ mixed-type data ] [ Synthetic Data Generation ] [ generative model ] [ tabular data ] [ Diffusion model ]


Score-based generative models or diffusion models have proven successful acrossmany domains in generating texts and images. However, the consideration ofmixed-type tabular data with this model family has fallen short so far. Existingresearch mainly combines continuous and categorical diffusion processes and doesnot explicitly account for the feature heterogeneity inherent to tabular data. In thispaper, we combine score matching and score interpolation to ensure a commontype of continuous noise distribution that affects both continuous and categoricalfeatures. Further, we investigate the impact of distinct noise schedules per feature orper data type. We allow for adaptive, learnable noise schedules to ensure optimallyallocated model capacity and balanced generative capability. Results show thatour model outperforms the benchmark models consistently and that accounting forheterogeneity within the noise schedule design boosts sample quality.

Chat is not available.