Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

TopoDiff: Improve Protein Backbone Generation with Topology-aware Latent Encoding

Yuyang Zhang · Zinnia Ma · Haipeng Gong


The \textit{de novo} design of protein structures is an intriguing research topic in the field of protein engineering. Recent breakthroughs in diffusion-based generative models have demonstrated substantial promise in generating diverse and realistic protein structures. Nevertheless, while existing models either focus on unconditional generation or fine-grained conditioning at the residue level, a holistic, top-down approach to control the overall topological arrangements is still lacking. In response, we introduce TopoDiff, a diffusion-based framework augmented by a topology encoding module, which is capable of unsupervisedly learning a compact latent representation of natural protein topologies with interpretable characteristics and simultaneously harnessing this learnt information for controllable protein structure generation. We also propose a novel metric specifically designed to assess the coverage of sampled proteins with respect to the natural protein space. In comparative analyses with existing models, our generative model not only demonstrates comparable performance on established metrics but also exhibits better coverage across the recognized topology landscape. In summary, TopoDiff emerges as a novel solution towards enhancing the controllability and comprehensiveness of \textit{de novo} protein structure generation, presenting new possibilities for innovative applications in protein engineering and beyond.

Chat is not available.