Biorisk-Shift: Converting AI Vulnerabilities into Biological Threat Vectors
Abstract
Generative AI models are increasingly applied in biotechnological domains, yet existing safeguards fail to account for how diverse conversational failure modes can be repurposed into biorisk-relevant attacks. We present a crowdsourced, multi-turn jailbreak dataset collected from non-technical university students who achieved effective red teaming capabilities with 5+ hours of learning, plus transformation protocols that systematically identify safety vulnerabilities in frontier language models. Central to our study is a domain-targeted Biorisk-Shift transformation leveraging this dataset, which successfully converts general jailbreak patterns into high-stakes biological contexts with a 53.5\% bypass rate of biosafety guardrails. Complementary transformations, including Attack Enhancement and Failure Root-Cause Iteration, further expand the range of elicited harmful outputs. Benchmarks against defense-filtered models show that even state-of-the-art safeguards can be circumvented, underscoring how ordinary conversational exploits can escalate into risks for protein design, genome editing, and molecular synthesis. Our findings demonstrate the need for comprehensive biosecurity-specific evaluation methods that involve a large contributor base and integrated safeguards that directly address the translation of everyday model failures into extreme biological threats.