Skip to yearly menu bar Skip to main content


Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

Kyle O'Brien · Stephen Casper · Quentin Anthony · Tomek Korbak · Robert Kirk · Xander Davies · Ishan Mishra · Geoffrey Irving · Yarin Gal · Stella Biderman

Abstract

Log in and register to view live content