Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

Large-scale self-supervised pre-training on protein three-dimensional structures

Ilya Senatorov


Recent developments in the protein structure prediction field led to a drastic increase in the number of available protein three-dimensional structures. This creates a challenge and presents an opportunity for discovering fitting approaches to utilise such new datasets in various machine learning settings. In this paper, we propose STEP (STructural Embedding of Proteins) a self-supervised learning approach for creating meaningful embeddings of protein structures and demonstrate its utility in a variety of downstream tasks. We study various approaches to such a problem, including deep metric learning, as well assimple label prediction tasks. We demonstrate the superiority of STEP over existing models in a variety of downstream tasks, including the prediction of drug-target interactions. We show that for especially challenging tasks, such as predicting drugs for new proteins, our model shows improvement of up to 0.1 AUROC over previous methods.

Chat is not available.