Skip to yearly menu bar Skip to main content


Poster

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

Richard Ren · Steven Basart · Adam Khoja · Alice Gatti · Long Phan · Xuwang Yin · Mantas Mazeika · Alexander Pan · Gabriel Mukobi · Ryan Kim · Stephen Fitz · Dan Hendrycks
2024 Poster

Abstract

Video

Chat is not available.