Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

DIFFMASIF: Score-Based Diffusion Models for Protein Surfaces

Mehmet Akdel · Freyr Sverrisson · Dylan Abramson · Jean Feydy · Alexander Goncearenco · Yusuf Adeshina · Daniel Kovtun · CĂ©line Marquet · Xuejin Zhang · David Baugher · Zachary Carpenter · Luca Naef · Michael Bronstein · Bruno Correia

Abstract: Predicting protein-protein complexes is a central challenge of computational structural biology. However, existing state-of-the-art methods rely on co-evolution learned on large amino acid sequence datasets and thus often fall short on both transient and engineered interfaces (which are of particular interest in therapeutic applications) where co-evolutionary signals are absent or minimal. To address this, we introduce \diffmasif, a novel, score-based diffusion model for rigid protein-protein docking. Instead of sequence-based features, \diffmasif uses a protein molecular surface-based encoder-decoder architecture trained via a novel combination of geometric pre-training tasks to effectively learn physical complementarity. The encoder uses learned geometric features extracted from protein surface point clouds as well as geometrically pre-trained residue embeddings pooled to the surface. It directly learns binding site complementary through prediction of contact sites as both pretraining and auxiliary loss, and also allows for specification of known binding sites during inference. It is followed by a decoder predicting rotation and translation via $\mathrm{SO}(3)$ diffusion. We show that \diffmasif \ achieves SOTA among Deep Learning methods for rigid body docking, in particular on structurally novel interfaces and low sequence conservation. This provides a significant advance towards accurate modelling of protein interactions with low co-evolution and their many practical applications.

Chat is not available.