SphereEdit: Geometric Control for Composable Diffusion-Based Image Editing
Abstract
Despite significant advances in diffusion models, achieving precise, composable image editing without task-specific training remains a challenge. Existing approaches often rely on iterative optimization or linear latent operations, which are slow, brittle, and prone to entangling attributes (e.g., lipstick altering skin tone). We introduce SphereEdit, a training-free framework that leverages the hyperspherical geometry of CLIP embeddings to enable interpretable, fine-grained control. We model semantic attributes as unit-norm directions on the sphere and show that it supports clean composition via angular controls. At inference, SphereEdit uses spherical directions to modulate cross-attention producing spatially localized edits across diverse domains without optimization or fine-tuning. Experiments demonstrate sharper, more disentangled adjustments. SphereEdit provides a geometrically grounded, plug-and-play framework for controllable and composable diffusion editing.