Poster
Exploring Low-Dimensional Subspace in Diffusion Models for Controllable Image Editing
Siyi Chen · Huijie Zhang · Minzhe Guo · Yifu Lu · Peng Wang · Qing Qu
East Exhibit Hall A-C #1610
Recently, diffusion models have emerged as a powerful class of generative models with impressive generative capabilities. Despite their success in generating images guided by class or text-to-image conditions, achieving precise and disentangled image generation without additional training remains a significant challenge. In this work, we take one step towards this problem by starting from an intriguing observation: among a certain range of noise levels, the learned posterior mean predictor (PMP) is locally linear, and the singular vectors of its Jacobian lie in low-dimensional semantic subspaces. Under mild data assumptions, we validate the low-rankness and linearity of the PMP, as well as the homogeneity, composability, and linearity of the identified semantic directions within the subspace. These properties are quite universal, appearing consistently across various network architectures (e.g., UNet and Transformers) and datasets. These insights motivate us to propose LOw-rank COntrollable image editing (LOCO Edit). Specifically, the local linearity in the Jacobian provides a single-step, training-free method for precise local editing of regions of interest, while the low-rank nature allows for the effective identification of semantic directions using subspace power methods. Our method is broadly applicable to both undirected and text-directed editing and works across various diffusion-based models. Finally, extensive empirical studies demonstrate the effectiveness and efficiency of our approach.
Live content is unavailable. Log in and register to view live content