Physics–Preference Aligned Tool-Using Policies for Molecular Design with Gemma-3 270M
Abstract
Discovering molecules with desirable electronic and biophysical properties remains a central challenge in chemistry. Data-driven generators can ignore physical constraints and demand large simulation budgets. We study whether a compact, instruction-tuned language model can accelerate design—rather than pure prediction—when coupled with rigorous physics-based feedback. We present PhysPref, a two-module framework built around the Gemma-3 270M model. A Reporter parses simulator outputs into normalized JSON, while a Planner produces structured tool-calling sequences. The Planner is aligned to quantum and docking objectives via Direct Preference Optimization (DPO) and governed by a compute-budget controller with explicit cost accounting. On PCQM4Mv2, FreeSolv, and AqSolDB, PhysPref discovers higher-quality molecules under a fixed budget and, in our runs, uses roughly 20–40% fewer expensive calls than strong non-LLM baselines at comparable design quality.