Skip to yearly menu bar Skip to main content


Qualcomm AI Research

Expo Demonstration

Mobile Video Diffusion Transformers

Ron Tindall

Upper Level Room 29A-D
[ ]
Tue 2 Dec noon PST — 3 p.m. PST

Abstract:

We demonstrate Neogradon, the first video diffusion transformer (DiT) designed to run on low-power NPUs in mobile devices, such as phones and laptops. Despite DiTs huge memory and computation cost due to the quadratic attention over thousands of video tokens, we show that mobile devices can run these models when being designed for efficiency. To achieve this level of efficiency: x000D
x000D
We replace the original large text encoder with a much smaller one with minimal quality loss through our novel distillation framework, which doesn’t require any image or video data. x000D
x000D
We propose an asymmetric decoder distillation approach, which allows us to replace the native codec-latent-VAE decoder with a more efficient one, without disturbing the generative latent-space of the video generation pipeline. x000D
x000D
With our block pruning strategy, we remove entire blocks from the MMDiT denoiser based on their relative importance and recover the original performance through a two-stage distillation process. x000D
x000D
We reduce the diffusion sampling cost using our novel extended version of DMD (distribution matching distillation) for the pyramidal flow-matching objective. x000D
x000D
Neodragon generates 49 frames of 640x1024 resolution within 7.6 seconds on the Qualcomm Hexagon NPU with the VBench total score of 81.61, setting a new state of the art for mobile video generation. x000D
x000D
"This Proposal is provided for review and evaluation purposes only. Do not redistribute to any third party without the express prior written consent of Qualcomm Technologies, Inc."

Live content is unavailable. Log in and register to view live content