Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Safe Generative AI

Model Manipulation Attacks Enable More Rigorous Evaluations of LLM Unlearning

Zora Che ⋅ Stephen Casper ⋅ Anirudh Satheesh ⋅ Rohit Gandikota ⋅ Domenic Rosati ⋅ Stewart Slocum ⋅ Lev McKinney ⋅ Zichu Wu ⋅ Zikui Cai ⋅ Bilal Chughtai ⋅ Furong Huang ⋅ Dylan Hadfield-Menell

Abstract

Chat is not available.