Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

Explainable Deep Generative Models, Ancestral Fragments, and Murky Regions of the Protein Structure Universe

Eli Draizen · Cameron Mura · Philip Bourne


Modern proteins did not arise abruptly, as singular events, but rather over the course of at least 3.5 billion years of evolution. Can machine learning teach us how this occurred? The molecular evolutionary processes that yielded the intricate three-dimensional (3D) structures of proteins involve duplication, recombination and mutation of genetic elements, corresponding to short peptide fragments. Identifying and elucidating these ancestral fragments is crucial to deciphering the interrelationships amongst proteins, as well as how evolution acts upon protein sequences, structures & functions. Traditionally, structural fragments have been found using sequence and 3D structural alignment, but that becomes challenging when proteins have undergone extensive permutations-allowing two proteins to share a common architecture, though their topologies may drastically differ (a phenomenon termed the Urfold). We have designed a new framework to identify compact, potentially-discontinuous peptide fragments by combining (i) deep generative models of protein superfamilies with (ii) layer-wise relevance propagation (LRP) to identify atoms of great relevance in creating an embedding during an all superfamilies x all domains analysis. Our approach recapitulates known relationships amongst the evolutionarily ancient small beta-barrels (e.g. SH3 and OB folds) and P-loop-containing proteins (e.g. Rossmann and P-loop NTPases), previously established via manual analysis. Because of the generality of our deep model's approach, we anticipate that it can enable the discovery of new ancestral peptides. In a sense, our framework uses LRP as an `explainable AI' approach, in conjunction with a recent deep generative model of protein structure (termed DeepUrfold), in order to leverage decades worth of structural biology knowledge to decipher the underlying molecular bases for protein structural relationships-including those which are exceedingly remote, yet can be discovered via deep learning.

Chat is not available.