Skip to yearly menu bar Skip to main content


Paraphrasing Away Malicious Tokens: Improving LLM Finetuning Safety by Filtering Spurious Correlation

Marcel Mateos Salles · Praney Goyal · Pradyut Sekhsaria · Hai Huang · Randall Balestriero

Abstract

Chat is not available.