NeurIPS Poster Few-Shot Diffusion Models Escape the Curse of Dimensionality

Poster

Few-Shot Diffusion Models Escape the Curse of Dimensionality

Ruofeng Yang · Bo Jiang · Cheng Chen · ruinan Jin · Baoxiang Wang · Shuai Li

East Exhibit Hall A-C #2410

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: While diffusion models have demonstrated impressive performance, there is a growing need for generating samples tailored to specific user-defined concepts. The customized requirements promote the development of few-shot diffusion models, which use limited

n_{t a}

$n_{ta}$ target samples to fine-tune a pre-trained diffusion model trained on

n_{s}

$n_s$ source samples. Despite the empirical success, no theoretical work specifically analyzes few-shot diffusion models. Moreover, the existing results for diffusion models without a fine-tuning phase can not explain why few-shot models generate great samples due to the curse of dimensionality. In this work, we analyze few-shot diffusion models under a linear structure distribution with a latent dimension

d

$d$ . From the approximation perspective, we prove that few-shot models have a

\tilde{O} (n_{s}^{- 2 / d} + n_{t a}^{- 1 / 2})

$\widetilde{O}(n_s^{-2/d}+n_{ta}^{-1/2})$ bound to approximate the target score function, which is better than

n_{t a}^{- 2 / d}

$n_{ta}^{-2/d}$ results. From the optimization perspective, we consider a latent Gaussian special case and prove that the optimization problem has a closed-form minimizer. This means few-shot models can directly obtain an approximated minimizer without a complex optimization process. Furthermore, we also provide the accuracy bound

\tilde{O} (1 / n_{t a} + 1 / \sqrt{n_{s}})

$\widetilde{O}(1/n_{ta}+1/\sqrt{n_s})$ for the empirical solution, which still has better dependence on

n_{t a}

$n_{ta}$ compared to

n_{s}

$n_s$ . The results of the real-world experiments also show that the models obtained by only fine-tuning the encoder and decoder specific to the target distribution can produce novel images with the target feature, which supports our theoretical results.

Chat is not available.