A Scalable Latent Diffusion Model for Single-Cell Gene Expression Data
Abstract
Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from unbounded counts of expression data and complex dependencies within gene sets. Existing generative models often impose artificial gene orderings, reducing both their flexibility and biological relevance. We introduce a scalable latent diffusion model for single-cell gene expression that respects the fundamental exchangeability property of gene measurements. Unlike existing approaches requiring artificial orderings or complex hierarchies, we propose a streamlined VAE using fixed-size latent variables with permutation-invariant and permutation-equivariant components. Our unified Multi-head Cross-Attention Block (MCAB) serves dual roles: permutation-invariant pooling in the encoder and permutation-equivariant unpooling in the decoder, eliminating separate mechanisms for varying gene sets. We enhance this framework by replacing the Gaussian prior with a latent diffusion model using Diffusion Transformers and linear interpolants, enabling high-quality generation with multi-conditional classifier-free guidance. Our approach naturally handles single-cell data challenges like high-dimensional sparsity that is demonstrated by its superior performance in unconditional and conditional cell generation benchmarks.