Poster
in
Workshop: AI Virtual Cells and Instruments: A New Era in Drug Discovery and Development

LLMs as Virtual Instruments for Drug Formulation

Michael Craig · Gary Tom · Pauric Bannigan · Christine Allen · Riley Hickman

Project Page [ OpenReview]

Abstract

Pharmaceutical formulation design is a long-tail problem in which most drug candidates are supported only by small heterogeneous datasets. Although each case is distinct, its resolution is critical to the clinical and commercial success of drug products. While large language models (LLMs) have been increasingly applied to accelerate scientific discovery, formulation science has been relatively understudied, largely due to a scarcity of suitable public data. We evaluated whether commercial large language models (LLMs) can act as virtual scientific instruments by encoding self-emulsifying drug delivery systems (SEDDS) into compact “formulation cards” that serialize composition, physicochemical descriptors, and design metadata into a standardized prompt. Using three regression targets, droplet size, active pharmaceutical ingredient (API) loading mass fraction, and polydispersity index (PDI), we benchmark these models in two deployment regimes: within-API generalization (bootstrapping new formulations from previous batches) and between-API generalization (cold start prediction for held out APIs). We systematically vary inference-time reasoning effort and few-shot context size $K \in \{0,5,10,20\}$, and find that increasing $K$ results in increased predictive accuracy, while increasing reasoning effort results in low to modest and target-dependent improvements. Larger models also generally outperform smaller ones, suggesting that inference time scaling and in-context learning provide practical knobs to improve predictive accuracy without retraining. Together, these results establish the first ML baselines on the SEDDS corpus and position frontier LLMs as flexible, data-efficient instruments for accelerating drug formulation design in resource-constrained discovery pipelines. They also offer a reference point for evaluating how customized open-source models, via fine-tuning, retrieval, or hybrid approaches, compare under consistent conditions.

Chat is not available.