Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

TriFold: A New Architecture for Predicting Protein Sequences from Structural Data

Harish Srinivasan · Jian Zhou


The inverse protein folding challenge aims to identify specific amino acid sequences that fold into a predetermined protein structure. Despite advancements like AlphaFold2, it remains a complex issue in protein engineering. This paper introduces a novel architecture inspired by the self-attention mechanisms in AlphaFold2 and RoseTTAFold2, adapted for solving the inverse folding problem. Our approach, contrasted with previous graph-based models, leverages attention-based transformer architecture to efficiently integrate information across the entire protein. We combine attention mechanisms, such as invariant point attention, with those designed for sequence and pair representations, resulting in enhanced performance in the inverse protein folding task. Furthermore, we introduce a novel feature representation of protein structure used as an inductive bias in pair representation. The proposed model is trained and tested using the OpenFold codebase on the Protein Data Bank and the AlphaFold distillation dataset, achieving performance improvements over ProteinMPNN regarding sequence recovery. The model's validation on the CAMEO dataset, which comprises proteins released from October 16th, 2021 – January 16th, 2022, further substantiates its efficacy in enhanced sequence recovery across short, single, and multiple chains.

Chat is not available.