Timezone: »
Recently a number of works have demonstrated successful applications of a fully data-driven approach to protein design, based on learning generative models of the distribution of a family of evolutionarily related sequences. Language modelling techniques promise to generalise this design paradigm across protein space, however have for the most part neglected the rich evolutionary signal in multiple sequence alignments and relied on fine-tuning to adapt the learned distribution to a particular family. Inspired by the recent development of alignment-based language models, exemplified by the MSA Transformer, we propose a novel alignment-based generative model which combines an input MSA encoder with an autoregressive sequence decoder, yielding a generative sequence model which can be explicitly conditioned on evolutionary context. To test the benefits of this generative MSA-based approach in design-relevant settings we focus on the problem of unsupervised fitness landscape modelling. Across three unusually diverse fitness landscapes, we find evidence that directly modelling the distribution over full sequence space leads to improved unsupervised prediction of variant fitness compared to scores computed with non-generative masked language models. We believe that combining explicit encoding of evolutionary information with a generative decoder's representation of a distribution over sequence space provides a powerful framework generalising traditional family-based generative models.
Author Information
Alex Hawkins-Hooker (University College London)
David Jones (University College London)
Brooks Paige (UCL)
More from the Same Authors
-
2022 : Using domain-domain interactions to probe the limitations of MSA pairing strategies »
Alex Hawkins-Hooker · David Jones · Brooks Paige -
2022 : Towards Healing the Blindness of Score Matching »
Mingtian Zhang · Oscar Key · Peter Hayes · David Barber · Brooks Paige · Francois-Xavier Briol -
2023 Poster: Moment Matching Denoising Gibbs Sampling »
Mingtian Zhang · Alex Hawkins-Hooker · Brooks Paige · David Barber -
2020 Workshop: Machine Learning for Molecules »
José Miguel Hernández-Lobato · Matt Kusner · Brooks Paige · Marwin Segler · Jennifer Wei -
2019 : Molecules and Genomes »
David Haussler · Djork-Arné Clevert · Michael Keiser · Alan Aspuru-Guzik · David Duvenaud · David Jones · Jennifer Wei · Alexander D'Amour