Token-Level Guided Discrete Diffusion for Membrane Protein Design
Abstract
Reparameterized diffusion models (RDMs) have recently matched autoregressive methods in protein generation, motivating their application to membrane proteins, which contain interleaved soluble and transmembrane (TM) regions. We present the MeMbrane Diffusion Language Model (MeMDLM), a fine-tuned RDM-based protein language model for controllable membrane protein design. MeMDLM-generated sequences recapitulate the TM residue density and structural features of natural proteins and outperform state-of-the-art diffusion baselines in motif scaffolding with lower perplexity, higher BLOSUM-62 scores, and improved pLDDT confidence. To introduce specific functional properties in our designs, we develop Per-Token Guidance (PET), a classifier-guided sampling strategy that solubilizes sequences while preserving conserved TM domains, reducing TM density without disrupting functional cores. Importantly, MeMDLM designs validated in TOXCAT β-lactamase wet lab assays insert into membranes, distinguishing high-quality from poor designs. Together, MeMDLM and PET establish the first experimentally validated diffusion-based framework for rational membrane protein generation.