Skip to yearly menu bar Skip to main content


Poster
in
Workshop: New Frontiers of AI for Drug Discovery and Development

Leveraging expert feedback to align proxy and ground truth rewards in goal-oriented molecular generation

Julien Martinelli · Yasmine Nahal · Duong LĂȘ · Ola Engkvist · Samuel Kaski

Keywords: [ goal-oriented molecular generation ] [ human-in-the-loop ] [ Active Learning ]


Abstract:

Reinforcement learning has proven useful for \emph{de novo} molecular design. Leveraging a reward function associated with a given design task allows for efficiently exploring the chemical space, thus producing relevant candidates.Nevertheless, while tasks involving optimization of drug-likeness properties such as LogP or molecular weight do enjoy a tractable and cheap-to-evaluate reward definition, more realistic objectives such as bioactivity or binding affinity do not.For such tasks, the ground truth reward is prohibitively expensive to compute and cannot be done inside a molecule generation loop, thus it is usually taken as the output of a statistical model.Such a model will act as a faulty reward signal when taken out-of-training distribution, which typically happens when exploring the chemical space, thus leading to molecules judged promising by the system, but which do not align with reality.We investigate this alignment problem through the lens of Human-In-The-Loop ML and propose a combination of two reward models independently trained on experimental data and expert feedback, with a gating process that decides which model output will be used as a reward for a given candidate. This combined system can be fine-tuned as expert feedback is acquired throughout the molecular design process, using several active learning criteria that we evaluate. In this active learning regime, our combined model demonstrates an improvement over the vanilla setting, even for noisy expert feedback.

Chat is not available.