Skip to yearly menu bar Skip to main content


Poster

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

Cameron Tice ⋅ Philipp Kreer ⋅ Nathan Helm-Burger ⋅ Prithviraj Shahani ⋅ Fedor Ryzhenkov ⋅ Fabien Roger ⋅ Clement Neo ⋅ Jacob Haimes ⋅ Felix Hofstätter ⋅ Teun van der Weij
2025 Poster

Abstract

Video

Chat is not available.