Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Safe Generative AI

Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

Neel Jain ⋅ Aditya Shrivastava ⋅ Chenyang Zhu ⋅ Daben Liu ⋅ Alfy Samuel ⋅ Ashwinee Panda ⋅ Anoop Kumar ⋅ Micah Goldblum ⋅ Tom Goldstein

Abstract

Chat is not available.