Skip to yearly menu bar Skip to main content

Workshop: Machine Learning in Structural Biology

Turning high-throughput structural biology into predictive drug design

Kadi Saar · Daren Fearon · John Chodera · Frank von Delft · Alpha Lee

Abstract: A common challenge in drug design pertains to finding chemical modifications to a ligand that increases its affinity to the target protein. An untapped advance is the increase in structural biology throughput, which has progressed from an artisanal endeavour to a monthly throughput of up to 100 different ligands against a protein in modern synchrotrons. However, the missing piece is a framework that turns high throughput crystallography data into predictive ligand design. Here, we design a simple machine learning approach that predicts protein-ligand affinity from experimental structures of diverse ligands against a single protein paired with biochemical measurements. Our key insight is using physics-based energy descriptors to represent protein-ligand complexes, and a learning-to-rank approach that infers the relevant differences between binding modes. We ran a high throughput crystallography campaign against the SARS-CoV-2 Main Protease (M$^{\mathrm{Pro}}$) involving over 200 protein-ligand complexes and developed models that could predict with high accuracy the relative binding strength of the ligands through the timecourse of the campaign. Crucially, our approach successfully extends ligands to unexplored regions of the binding pocket, executing large and fruitful moves in chemical space with simple chemistry. We have used this approach to design compounds that improved the potency of two different micromolar hits by over 10-fold, arriving at a lead compound with 80 nM antiviral efficacy -- amongst the highest to date reported for non-covalent inhibitors.

Chat is not available.