Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology Workshop

Ligand-aware protein sequence design using protein self contacts

Jody Mou · Benjamin Fry · Chun-Chen Yao · Nicholas Polizzi


The design of ligand-binding proteins remains a significant challenge. Few, if any, structure-to-sequence deep learning methods include representations of small molecules for use in sequence design. Here, we show that favorable interactions between chemical-group fragments and proteins can be learned from large databases consisting of protein self contacts. We approximate ligands as collections of proteinaceous chemical groups and train simple MLPs to learn amino-acid identities when conditioned on the placement of these chemical groups relative to the backbone of a residue. We use fragment-aware amino-acid probabilities to compute the binding-site residues of protein-ligand structures and evaluate our method by sequence recovery. Surprisingly, this simple fragment-aware feature can in some cases accurately predict residue identities with no prior knowledge of binding site structures.

Chat is not available.