rbio1 - training scientific reasoning LLMs with biological world models as soft verifiers
Abstract
Reasoning models are typically trained against verification mechanisms in formally specified systems such as code or symbolic math. In open domains like biology, however, we lack exact rules for large-scale formal verification and instead rely on lab experiments to test predictions. Such experiments are slow, costly, and cannot scale with computation. In this work, we show that biological world models and other prior knowledge can serve as approximate oracles for soft verification, allowing reasoning systems to be trained without additional experimental data. We introduce two paradigms for this process: RLEMF (reinforcement learning with experimental model feedback) and RLPK (reinforcement learning from prior knowledge). Using these paradigms, we develop rbio1, a reasoning model for biology post-trained from a pretrained LLM with reinforcement learning. Soft verification distills biological world models into rbio1, which achieves state-of-the-art performance on the PERTURBQA benchmark. We present rbio1 as a proof of concept that predictions from biological models can train powerful reasoning systems using simulations rather than experimental data, offering a new paradigm for model training.