Oral
in
Workshop: Structured Probabilistic Inference and Generative Modeling

When rule learning breaks: Diffusion Fails to Learn Parity of Many Bits

Binxu Wang · Emma Finn · Bingbin Liu

Project Page [ OpenReview]

Abstract

Diffusion models can generate highly realistic samples, but do they learn the latent rules that govern a distribution, and if so, what kind of rule can they learn? We address this question using a controlled \emph{group-parity} benchmark on $6{\times}6$ binary images, where each group of $G$ bits must satisfy an even-parity constraint. This setup allows us to precisely tune rule complexity via $G$ and measure both correctness and memorization at the group and sample levels. Using EDM-parameterized Diffusion Transformers of varying depth, we find: (i) learnability depends jointly on $G$ and depth, with deeper models extending—but not eliminating—the range of learnable rules; (ii) successful rule learning exhibits a sharp early transition in accuracy that precedes memorization, creating a temporal window for generalization; (iii) memorization onset follows a steps-per-sample scaling law and is delayed by larger datasets. Theoretically, an energy/score analysis explains the depth dependence through the global multiplicative term in the parity score. Together, these results offer a principled testbed and new insights into the interplay between rule complexity, rule learning, and memorization in diffusion models.

Video

Chat is not available.