Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

Predicting Immune Escape with Pretrained Protein Language Model Embeddings

Kyle Swanson · Howard Chang · James Zou


Assessing the severity of new pathogenic variants requires an understanding of which mutations will escape the human immune response. Even single point mutations to an antigen can cause immune escape and infection via abrogation of antibody binding. Recent work has modeled the effect of single point mutations on proteins by leveraging the information contained in large-scale, pretrained protein language models. These models are often applied in a zero-shot setting, where the effect of each mutation is predicted based on the output of the language model with no additional training. However, this approach cannot appropriately model immune escape, which involves the interaction of two proteins---antibody and antigen---instead of one and requires making different predictions for the same antigenic mutation in response to different antibodies. Here, we explore several methods for predicting immune escape by building models on top of embeddings from pretrained protein language models. We evaluate our methods on a SARS-CoV-2 deep mutational scanning dataset and show that our embedding-based methods significantly outperform zero-shot methods, which have almost no predictive power. We additionally highlight insights into how best to use embeddings from pretrained protein language models to predict escape.

Chat is not available.