Skip to yearly menu bar Skip to main content

Workshop: AI for Science: Mind the Gaps

Regression modeling on DNA encoded libraries

Ralph Ma · Gabriel Dreiman · Fiorella Ruggiu · Adam Riesselman · Bowen Liu · Mohammad M Sultan · Daphne Koller


DNA encoded libraries (DELs) are pooled, combinatorial compound collections where each member is tagged with its own unique DNA barcode. DELs are used in drug discovery for early hit finding against protein targets. Recently, several groups have proposed building machine learning models with quantities derived from DEL datasets. However, DEL datasets have a low signal-to-noise ratio which makes modeling them challenging. To that end, we propose a novel graph neural network (GNN) based regression model that directly predicts enrichment scores from raw sequencing counts while accounting for multiple sources of technical variation and intrinsic assay noise. We show that our GNN regression model quantitatively outperforms standard classification approaches and can be used to find diverse sets of molecules in external virtual libraries.