Timezone: »

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
Muyang Li · Ji Lin · Chenlin Meng · Stefano Ermon · Song Han · Jun-Yan Zhu

Thu Dec 01 02:30 PM -- 04:00 PM (PST) @ Hall J #139
During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users tend to make gradual changes to the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited regions. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With 1.2%-area edited regions, our method reduces the computation of DDIM by $7.5\times$ and GauGAN by $18\times$ while preserving the visual fidelity. With \engineabbr, we accelerate the inference time of DDIM by $3.0\times$ on RTX 3090 and $6.6\times$ on Apple M1 Pro CPU, and GauGAN by $4.2\times$ on RTX 3090 and $14\times$ on Apple M1 Pro CPU.

Author Information

Muyang Li (Carnegie Mellon University)
Muyang Li

I’m a second-year MSR student at CMU, advised by Prof. Jun-Yan Zhu. Previously, I obtained my Bachelor’s degree from Zhiyuan College (ACM Class), Shanghai Jiao Tong University. I have spent wonderful time as a visiting student with Prof. Song Han at MIT. My research interest is in the intersection of machine learning, system, and computer graphics. I am currently working on building efficient and hardware-friendly generative models with its applications in computer vision and graphics.

Ji Lin (MIT)
Chenlin Meng (Stanford University)
Stefano Ermon (Stanford)
Song Han (MIT)
Jun-Yan Zhu (Carnegie Mellon University)

More from the Same Authors