Workshop: Synthetic Data for Empowering ML Research

Generating High Fidelity Synthetic Data via Coreset selection and Entropic Regularization

Omead Pooladzandi · Pasha Khosravi · Erik Nijkamp · Baharan Mirzasoleiman


Generative models have the ability to synthesize data points drawn from the data distribution, however, not all generated samples are high quality. In this paper, we propose using a combination of coresets selection methods and ``entropic regularization'' to select the highest fidelity samples. We leverage an Energy-Based Model which resembles a variational auto-encoder with an inference and generator model for which the latent prior is complexified by an energy-based model. In a semi-supervised learning scenario, we show that augmenting the labeled data-set, by adding our selected subset of samples, leads to better accuracy improvement rather than using all the synthetic samples.

Chat is not available.