Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

ZymCTRL: a conditional language model for the controllable generation of artificial enzymes

Noelia Ferruz


The design of custom-tailored proteins has the potential to provide novel and groundbreaking solutions in many fields, including molecular medicine or envi- ronmental sciences. Among protein classes, enzymes are particularly attractive because their complex active sites can accelerate chemical reactions and trans- formations by several orders of magnitude. Since enzymes are biodegradable nanoscopic materials, they hold an unmatched promise as sustainable, large-scale industrial catalysts. Motivated by the enormous success of language models in designing novel yet nature-like proteins, we hypothesized that an enzyme-specific language model could provide new opportunities to design purpose-built artificial enzymes. Here, we describe ZymCTRL, a conditional language model trained on the BRENDA database of enzymes, which generates enzymes of a specific Enzymatic Class upon a user prompt. ZymCTRL generates artificial enzymes distant to natural ones while their intended functionality matches predictions from orthogonal methods. We release the model to the community.

Chat is not available.