Workshop: Machine Learning in Structural Biology Workshop

Fast protein structure searching using structure graph embeddings

Joe Greener · Kiarash Jamali


Comparing and searching protein structures independent of primary sequence has proved useful for remote homology detection, function annotation and protein classification. With the recent leap in accuracy of protein structure prediction methods and increased availability of protein models, attention is turning to how to best make use of this data. Fast and accurate methods to search databases of millions of structures will be essential to this endeavour, in the same way that fast protein sequence searching underpins much of bioinformatics. We train a simple graph neural network to learn a low-dimensional embedding of protein structure, and show that the embedding can be used to query structures against large structural databases with accuracy comparable to current methods. The speed of the method and ability to scale to millions of structures makes it suitable for this structure-rich era.

Chat is not available.