Caffarelli Regularity and Hierarchical Phase Boundaries in Diffusion Model Latent Space
Abstract
Recent studies have shown phase-transition–like behavior in diffusion models, where a small perturbation of the initial Gaussian noise sample can cause an abrupt change in the generated image. The underlying mechanism of these transitions, however, remains theoretically underexplored. In this work, we investigate this phenomenon through the lens of the Riemannian metric on the latent space induced by the distance between CLIP embeddings. We observe a hierarchical emergence of phase boundaries: coarse boundaries appear in the early denoising steps, while finer boundaries progressively emerge within these regions as the diffusion process advances. These findings have practical implications for diffusion inversion–based image editing: images within the same Riemannian basin can be edited with only a few inversion steps, whereas images that are nearby in latent space but separated by a phase boundary require substantially more steps. To provide a theoretical foundation, we approximate the reverse diffusion dynamics by a discrete-time sequence of quadratic-cost optimal transport maps between successive noisy marginals. By employing Caffarelli’s regularity theory, we demonstrate that discontinuities of the diffusion generative map are associated with mode-splitting, thereby giving rise to phase boundaries. This leads to a hierarchical, tree-like organization of data distribution modes, implying that distances between images in this geometry follow an ultrametric structure.