Poster
Generating compositional scenes via Text-to-image RGBA Instance Generation
Alessandro Fontanella · Petru-Daniel Tudosiu · Yongxin Yang · Shifeng Zhang · Sarah Parisot
East Exhibit Hall A-C #2409
Text-to-image diffusion generative models can generate high quality images at the cost of tedious prompt engineering. Controllability can be improved by introducing layout conditioning, however existing methods lack layout editing ability and fine-grained control over object attributes. In this work, we propose to address layout-driven controllable image generation from a multi-layer perspective. We devise a novel training paradigm to adapt a diffusion model to generate isolated objects as RGBA images with transparency information. To build complex scenes, we then generate object scene components individually and introduce a multi-layer noise blending strategy to build a realistic composite scene. Our experiments show that our RGBA diffusion model is capable of generating diverse and high quality instances with precise control over object attributes. Through multi-layer composition, we demonstrate that our approach allows to build and manipulate complex scenes with fine-grained control over object appearance and location, granting a higher degree of control than competing methods.
Live content is unavailable. Log in and register to view live content