Workshop: Machine Learning in Structural Biology

Deep generative models create new and diverse protein structures

Zeming Lin · Tom Sercu · yann lecun · Alex Rives


We explore the use of modern variational autoencoders for generating protein structures. Models are trained across a diverse set of natural protein domains. Three-dimensional structures are encoded implicitly in the form of an energy function that expresses constraints on pairwise distances and angles. Atomic coordinates are recovered by optimizing the parameters of a rigid body representation of the protein chain to fit the constraints. The model generates diverse structures across a variety of folds, and exhibits local coherence at the level of secondary structure, generating alpha helices and beta sheets, as well as globally coherent tertiary structure. A number of generated protein sequences have high confidence predictions by AlphaFold that agree with their designs. The majority of these have no significant sequence homology to natural proteins.

Chat is not available.