Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

ESMFold Hallucinates Native-Like Protein Sequences

Jeliazko Jeliazkov · Diego del Alamo · Joel Karpiak


We describe protein sequence design by inverting the protein structure prediction algorithm ESMFold,which achieves high accuracy by relying on evolutionary patterns derived from a pretrained protein language models (PLM; ESM2). In principle, by inverting ESMFold, protein sequences can be designed to fulfill one or more design objectives, such as high prediction confidence, predicted protein binding, or other geometric constraints that can be expressed with loss functions. In practice, sequences designed using an inverted AlphaFold model, termed AFDesign, contained unnatural sequence profilesand were shown to express poorly, whereas an inverted RosettaFold network was shown to be sensitive to adversarial sequences. Here, we demonstrate that these limitations do not extend to neural networks that include PLMs, such as ESMFold. Our inverted model, termed ESM-Design, can generate sequences with profiles that are both more native-like and more likely to express than sequences generated using AFDesign. However these sequences are less likely to express than sequences rescued by the structure-based design method ProteinMPNN. The safeguard offered by the PLM came with steep increases in memory consumption, preventing proteins greater than 150 residues from being modeled on a single GPU with 80GB VRAM. During this investigation, we also observed the role played by different sequence initialization schemes, with random sampling of discrete amino acids improving convergence and model quality over any continuous random initialization method. Finally, we showed how this approach can be used to introduce sequence and structure diversification in small proteins such as ubiquitin, while respecting the sequence conservation of active site residues. Our results highlight the effects of architectural differences between structure prediction networks on zero-shot protein design.

Chat is not available.