Skip to yearly menu bar Skip to main content


Poster

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

Weicai Ye · Chenhao Ji · Zheng Chen · Junyao Gao · Xiaoshui Huang · Song-Hai Zhang · Wanli Ouyang · Tong He · Cairong Zhao · Guofeng Zhang


Abstract: Diffusion-based methods have achieved impressive success in 2D image or 3D object generation, however, 3D scene generation or even $360^{\circ}$ image generation remains constrained, due to the limited number of scene datasets, the complexity of the 3D scene itself, and the difficulty of generating consistent multi-view images. To address these issues, we first build a panoramic video dataset, which contains millions of consecutive panoramic frames with corresponding camera poses and text descriptions. We then propose a novel text-driven panorama generation framework to achieve scalable, consistent, and diverse panoramic scene generation. To generate multi-view consistent panoramic images, we design a spherical epipolar attention module with relative poses to ensure multi-view consistency.Extensive experiments demonstrate that our method can generate scalable, consistent, and diverse panoramic images.

Live content is unavailable. Log in and register to view live content