Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Generative AI and Biology (GenBio@NeurIPS2023)

ProteinRL: Reinforcement learning with generative protein language models for property-directed sequence design

Matt Sternke · Joel Karpiak

Keywords: [ Optimization ] [ Reinforcement Learning ] [ protein language models ]


Abstract: The overarching goal of protein engineering is the design and optimization of proteins customized for specific purposes. Generative protein language models (PLMs) allow for $\textit{de novo}$ protein sequence generation, however current PLMs lack capabilities for controllable sequence generation of sequences tailored with desired properties. Here we present ProteinRL a flexible, data-driven reinforcement learning framework for fine-tuning generative PLMs for the $\textit{de novo}$ design of sequences optimized for specific sequence and/or structural properties. We highlight two examples cases of realistic protein design goals: a single-objective design for sequences containing unusually high charge content, and a multi-objective design scenario of a hit expansion, diversifying a target sequence with generated sequences having high-confidence structure predictions and high probability predictions of soluble expression. In both cases ProteinRL fine-tuning guides the PLM towards generating sequences optimized for the defined properties, extending to values rarely or never seen in natural sequences or sequences generated without ProteinRL fine-tuning. The demonstrated success and adaptability of the ProteinRL framework allows for the $\textit{de novo}$ design of novel protein sequences optimized for applications across many areas of protein engineering.

Chat is not available.