Timezone: »

Learning from physics-based features improves protein property prediction
Amy Wang · Ava Soleimany · Alex X Lu · Kevin Yang

Data-based and physics-based methods have long been considered as distinct approaches for protein property prediction. However, they share complementary strengths, such that integrating physics-based features with machine learning may improve model generalizability and accuracy. Here, we demonstrate that incorporating pre-computed energetic features in machine learning models improves performance in out-of-distribution and low training data regimes in a proof of concept study with two distinct protein engineering tasks. By training with sequence, structure, and pre-computed Rosetta energy features on graph neural nets, we achieve performance comparable to masked inverse folding pretraining with the same architecture.

Author Information

Amy Wang (Stanford University)
Ava Soleimany (Microsoft Research)
Alex X Lu (Microsoft Research)

I’m a Senior Researcher at Microsoft Research New England, in the BioML group. I’m interested in how machine learning can help us discover new insights from biological data, by finding patterns that are too subtle or large-scale to identify unassisted. I primarily focus on biological images, and my research often designs self-supervised learning methods, as I believe these methods are unbiased by prior knowledge.

Kevin Yang (Microsoft)

More from the Same Authors