Workshop: Machine Learning in Structural Biology Workshop

ExpressUrself: A spatial model for predicting recombinant expression from mRNA sequence

Michael P Dunne · Javier Caceres-Delpiano


Maximising the yield of recombinantly expressed proteins is a critical part of any protein engineering pipeline. In most cases, the expression of a given protein can be tuned by adjusting its DNA coding sequence, however finding coding sequences that optimise expression is a nontrivial task. The 3-dimensional structure of mRNA is known to strongly influence the expression levels of proteins, due to its effect on the efficiency of ribosome attachment. While correlations between mRNA structure and expression are well established, no model to date has succeeded in effectively utilising this information to accurately predict expression levels. Here we present ExpressUrself, a model designed to capture spatial characteristics of the sequence surrounding the start codon of an mRNA transcript, and intended to be used for optimising protein expression. The model is trained and tested on a large data set of variant DNA sequences and is able to predict the expression of previously unseen transcripts to a high degree of accuracy.

Chat is not available.