Workshop: Machine Learning in Structural Biology

Deep generative models create new and diverse protein structures

Zeming Lin · Tom Sercu · yann lecun · Alex Rives


We explore the use of modern variational autoencoders for generating protein structures. Models are trained across a diverse set of natural protein domains. Three-dimensional structures are encoded implicitly in the form of an energy function that expresses constraints on pairwise distances and angles. Atomic coordinates are recovered by optimizing the parameters of a rigid body representation of the protein chain to fit the constraints. The model generates diverse structures across a variety of folds, and exhibits local coherence at the level of secondary structure, generating alpha helices and beta sheets, as well as globally coherent tertiary structure. A number of generated protein sequences have high confidence predictions by AlphaFold that agree with their designs. The majority of these have no significant sequence homology to natural proteins.