Skip to yearly menu bar Skip to main content


Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

Kyle O'Brien ⋅ Stephen Casper ⋅ Quentin Anthony ⋅ Tomek Korbak ⋅ Robert Kirk ⋅ Xander Davies ⋅ Ishan Mishra ⋅ Geoffrey Irving ⋅ Yarin Gal ⋅ Stella Biderman

Abstract

Video

Chat is not available.