Score-based Idempotent Distillation of Diffusion Models
Abstract
Diffusion and score-based models are popular approaches to generative modeling that iteratively transport samples from one distribution, usually a Gaussian, to a target data distribution. These models have gained popularity due to their stable training dynamics and high-fidelity generation quality. However, this stability and quality come at the cost of high computational intensity, as the data must be transported across the entire trajectory. An idempotent operator, by contrast, can project samples to a data distribution in a single operation. In this work, we unite diffusion and idempotent models by training idempotent models through distillation from diffusion models. We present Score-based Idempotent Generative Networks (SIGNs), generative models that support both single- and multi-step generation. We show that idempotent networks, under the idempotent constraint, can effectively distill a pre-trained diffusion model and enable faster inference compared to iterative score-based models. Like IGNs and score-based models, SIGNs can perform multi-step sampling, allowing users to trade off quality for efficiency. As these models operate directly on the source domain, they can project corrupted or alternate distributions back onto the target manifold, enabling zero-shot editing of inputs. We validate our models on a simple multi-modal dataset as well as multiple image datasets, achieving state-of-the-art results for idempotent models on the CIFAR and CelebA datasets.