Skip to yearly menu bar Skip to main content


Resisting RL Elicitation of Biosecurity Capabilities: Reasoning Models Exploration Hacking on WMDP

Joschka Braun · Yeonwoo Jang · Damon Falck · Roland S. Zimmermann · David Lindner · Scott Emmons

Abstract

Log in and register to view live content