ProteinNPT: Improving protein property prediction and design with non-parametric transformers
Pascal Notin · Ruben Weitzman · Debora Marks · Yarin Gal
Great Hall & Hall B1+B2 (level 1) #308
Protein design holds immense potential for optimizing naturally occurring sequences, with broad applications in drug discovery, material design, and sustainability. However, computational methods for protein engineering are confronted with significant challenges, including an expansive design space, sparse functional regions, and scarcity of available labels. Furthermore, real-life design scenarios often necessitate the simultaneous optimization of multiple properties, exacerbating label sparsity issues. In this paper, we present ProteinNPT, a non-parametric transformer variant tailored for protein sequences and particularly suited to label-scarce and multi-task optimization settings. We first expand the ProteinGym benchmark to evaluate models in supervised settings and develop several cross-validation schemes for robust assessment. Subsequently, we reimplement existing top-performing baselines, introduce several extensions of these baselines by integrating diverse branches of protein engineering literature, and demonstrate that ProteinNPT consistently outperforms all of them across a diverse set of protein property prediction tasks. Finally, we demonstrate the value of our approach for iterative protein design in several in silico Bayesian optimization experiments.