When rule learning breaks: Diffusion Fails to Learn Parity of Many Bits
Binxu Wang · Emma Finn · Bingbin Liu
Abstract
Diffusion models can generate highly realistic samples, but do they learn the latent rules that govern a distribution, and if so, what kind of rule can they learn? We address this question using a controlled \emph{group-parity} benchmark on $6{\times}6$ binary images, where each group of $G$ bits must satisfy an even-parity constraint. This setup allows us to precisely tune rule complexity via $G$ and measure both correctness and memorization at the group and sample levels. Using EDM-parameterized Diffusion Transformers of varying depth, we find: (i) learnability depends jointly on $G$ and depth, with deeper models extending—but not eliminating—the range of learnable rules; (ii) successful rule learning exhibits a sharp early transition in accuracy that precedes memorization, creating a temporal window for generalization; (iii) memorization onset follows a steps-per-sample scaling law and is delayed by larger datasets. Theoretically, an energy/score analysis explains the depth dependence through the global multiplicative term in the parity score. Together, these results offer a principled testbed and new insights into the interplay between rule complexity, rule learning, and memorization in diffusion models.
Video
Chat is not available.
Successful Page Load