Oral
in
Workshop: AI Virtual Cells and Instruments: A New Era in Drug Discovery and Development

rbio1 - training scientific reasoning LLMs with biological world models as soft verifiers

Ana-Maria Istrate · Fausto Milletari · Fabrizio Castrotorres · Jakub Tomczak · Michaela Torkar · Donghui Li · Theofanis Karaletsos

Project Page [ OpenReview]

Abstract

Reasoning models are typically trained against verification mechanisms in formally specified systems such as code or symbolic math. In open domains like biology, however, we lack exact rules for large-scale formal verification and instead rely on lab experiments to test predictions. Such experiments are slow, costly, and cannot scale with computation. In this work, we show that biological world models and other prior knowledge can serve as approximate oracles for soft verification, allowing reasoning systems to be trained without additional experimental data. We introduce two paradigms for this process: RLEMF (reinforcement learning with experimental model feedback) and RLPK (reinforcement learning from prior knowledge). Using these paradigms, we develop rbio1, a reasoning model for biology post-trained from a pretrained LLM with reinforcement learning. Soft verification distills biological world models into rbio1, which achieves state-of-the-art performance on the PERTURBQA benchmark. We present rbio1 as a proof of concept that predictions from biological models can train powerful reasoning systems using simulations rather than experimental data, offering a new paradigm for model training.

Video

Chat is not available.