Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

Conformational sampling and interpolation using language-based protein folding neural networks

Diego del Alamo · Jeliazko Jeliazkov · Daphne Truan · Joel Karpiak


Protein language models (PLMs), such ESM2, learn a rich semantic grammar of the protein sequence space. When coupled to protein folding neural networks (e.g., ESMFold), they can facilitate the prediction of tertiary and quaternary protein structures at high accuracy. However, they are limited to modeling protein structures in single states. This manuscript demonstrates that ESMFold can predict alternate conformations of some proteins, including de novo designed proteins. Randomly masking the sequence prior to PLM input returned alternate embeddings that ESMFold sometimes mapped to distinct physiologically relevant conformations. From there, inversion of the ESMFold trunk facilitated the generation of high-confidence interconversion paths between the two states. These paths provide a deeper glimpse of how language-based protein folding neural networks derive structural information from high-dimensional sequence representations, while exposing limitations in their general understanding of protein structure and folding.

Chat is not available.