Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Safe Generative AI

Model Manipulation Attacks Enable More Rigorous Evaluations of LLM Unlearning

Zora Che · Stephen Casper · Anirudh Satheesh · Rohit Gandikota · Domenic Rosati · Stewart Slocum · Lev McKinney · Zichu Wu · Zikui Cai · Bilal Chughtai · Furong Huang · Dylan Hadfield-Menell

Abstract

Chat is not available.