Workshop: Machine Learning in Structural Biology Workshop

Uncovering sequence diversity from a known protein structure

Luca Alessandro Silva · Barthélémy Meynard · Carlo Lucibello · Christoph Feinauer


We present InvMSAFold, a method generating a diverse set of protein sequencesfolding into a single structure. For a given structure it defines a probability distribution over the space of sequences.This distribution captures second-order correlations observed in Multiple Sequence Alignments (MSA) of homologous proteins. Our innovation lies in generating highly diverse protein sequences while preserving structural and functional integrity. This approach offers exciting prospects, particularly in directed evolution, by providing diverse starting points for protein design.

