Timezone: »

Protein structure generation via folding diffusion
Kevin Wu · Kevin Yang · Rianne van den Berg · James Zou · Alex X Lu · Ava Amini

The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a new diffusion-based generative model that designs protein backbone structures via a procedure that mirrors the native folding process. We describe protein backbone structure as a series of consecutive angles capturing the relative orientation of the constituent amino acid residues, and generate new structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins biologically twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release the first open-source codebase and trained models for protein structure diffusion.

Author Information

Kevin Wu (Stanford University)
Kevin Yang (Microsoft)
Rianne van den Berg (Microsoft Research)
James Zou (Stanford University)
Alex X Lu (Microsoft Research)
Alex X Lu

Alex Lu is a Senior Researcher at Microsoft Research New England, in the BioML group. His research focuses on machine learning methods that enable biologists to discover new hypotheses from big biological datasets. Facets of his research program include self-supervised representation learning towards the goal of characterizing previously unknown classes or patterns, foundation models towards the goal of making machine learning accessible to biologists working with more restricted datasets, and robustness and domain generalization to ensure that methods detect biological signal and not technical confounders.

Ava Amini (Microsoft Research)

More from the Same Authors