Workshop: Machine Learning in Structural Biology Workshop

Learning from physics-based features improves protein property prediction

Amy Wang · Ava Soleimany · Alex X Lu · Kevin Yang


Data-based and physics-based methods have long been considered as distinct approaches for protein property prediction. However, they share complementary strengths, such that integrating physics-based features with machine learning may improve model generalizability and accuracy. Here, we demonstrate that incorporating pre-computed energetic features in machine learning models improves performance in out-of-distribution and low training data regimes in a proof of concept study with two distinct protein engineering tasks. By training with sequence, structure, and pre-computed Rosetta energy features on graph neural nets, we achieve performance comparable to masked inverse folding pretraining with the same architecture.

Chat is not available.