Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

DSMBind: an unsupervised generative modeling framework for binding energy prediction

Wengong Jin · Caroline Uhler · Nir HaCohen

[ ]
presentation: Machine Learning in Structural Biology Workshop
Fri 15 Dec 6:30 a.m. PST — 3:05 p.m. PST


Predicting the binding between proteins and other molecules is a core question in biology. Geometric deep learning is a promising paradigm for protein-ligand or protein-protein binding energy prediction, but its accuracy is limited by the size of training data as high-throughput binding assays are expensive. Unsupervised learning, such as protein language models, is particularly useful in this setting because it does not need experimental binding energy data for training. In this work, we propose DSMBind, a new generative modeling framework for protein complex structures, and show that the likelihood of crystal structures are highly correlated with their binding energy. Specifically, DSMBind learns an energy-based model from a training set of unlabeled crystal structures via SE(3) denoising score matching (DSM), where we perturb a protein complex via random rotation of backbone and side-chains. We find the learned energy is highly correlated with experimental binding affinity across multiple benchmarks, including protein-ligand binding, antibody-antigen binding, and protein-protein binding mutation effect prediction. DSMBind not only outperforms unsupervised learning methods based on protein language models or inverse folding, but also matches the performance of state-of-the-art supervised models trained on experimental binding data.

Chat is not available.