SafeDiscovery–Plans: An Open, Safety‑Constrained Scientific Planning Dataset for Agentic AI Across High‑Risk Domains
Abstract
Scientific discovery routinely involves executing complex sequences oflaboratory steps while navigating institutional policies, biosafetylevels and regulatory constraints. Current language models excel atgeneral planning but falter when tasks demand both scientificcompetence and rigorous adherence to safety rules. We introduceSafeDiscovery–Plans, an open dataset of safety‑constrainedscientific plans designed to teach agentic AI how to transformhigh‑level research goals into safe, compliant procedures. Eachexample pairs a goal and laboratory setting with a validated,stepwise plan that either accomplishes the objective or proposes asafe redirection when it cannot be achieved under the givenconstraints. Plans include personal protective equipment (PPE),engineering controls, safe substitutions, decision points andcitations to authoritative sources. First version will contain roughly30000 records spanning chemistry, biology and other high‑riskdomains, with a roadmap to larger scale. By supplyingstructured supervision for policy‑grounded planning, SafeDiscovery–Plansfills a critical gap between capability‑centric benchmarks andrefusal‑centric safety datasets.