Skip to yearly menu bar Skip to main content


Poster Thu, Dec 4, 2025 • 4:30 PM – 7:30 PM PST

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

Cameron Tice ⋅ Philipp Kreer ⋅ Nathan Helm-Burger ⋅ Prithviraj Shahani ⋅ Fedor Ryzhenkov ⋅ Fabien Roger ⋅ Clement Neo ⋅ Jacob Haimes ⋅ Felix Hofstätter ⋅ Teun van der Weij

Abstract

Video

Chat is not available.