MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion
Shitao Tang · Fuyang Zhang · Jiacheng Chen · Peng Wang · Yasutaka Furukawa
Great Hall & Hall B1+B2 (level 1) #201
This paper introduces MVDiffusion, a simple yet effective multi-view image generation method for scenarios where pixel-to-pixel correspondences are available, such as perspective crops from panorama or multi-view images given geometry (depth maps and poses). Unlike prior methods that rely on iterative image warping and inpainting, MVDiffusion concurrently generates all images with a global awareness, encompassing high resolution and rich content, effectively addressing the error accumulation prevalent in preceding models. MVDiffusion specifically incorporates a correspondence-aware attention mechanism, enabling effective cross-view interaction. This mechanism underpins three pivotal modules: 1) a generation module that produces low-resolution images while maintaining global correspondence, 2) an interpolation module that densifies spatial coverage between images, and 3) a super-resolution module that upscales into high-resolution images. In terms of panoramic imagery, MVDiffusion generates high-resolution photorealistic images up to 1024*1024 pixels. For geometry-conditioned multi-view image generation, MVDiffusion demonstrates state-of-the-art performance on texture-map generation for a given scene mesh. We recommend referring to our Arxiv version at https://arxiv.org/pdf/2307.01097.pdf for the latest update. The project page is at https://mvdiffusion.github.io/.