Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

Plug & Play Directed Evolution of Proteins with Gradient-based Discrete MCMC

Patrick Emami · Aidan Perreault · Jeffrey Law · David Biagioni · Peter St. John


A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations that improve the function of a known protein. We introduce a plug and play framework for evolving proteins in silico that supports mixing and matching a variety of generative models with discriminative models to help constrain search to the proteins most likely to appear in nature. Our framework achieves this by sampling from a product of experts distribution defined in discrete protein space and does not require any model fine-tuning or re-training. Instead of resorting to sample-inefficient search based on random mutations, as is typical of previous plug and play algorithms for protein engineering, we propose a fast discrete sampler that uses gradients to efficiently identify promising mutations. Our in silico directed evolution experiments on wide fitness landscapes show that we efficiently discover variants that are multiple mutations away from a wild type protein with high evolutionary sequence likelihood as well as estimated activity. Our framework is analyzed across a range of different evolutionary generative models including a 650M parameter protein language model.

Chat is not available.