Skip to yearly menu bar Skip to main content


AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

Sicheng Zhu ⋅ Ruiyi Zhang ⋅ Bang An ⋅ Gang Wu ⋅ Joe Barrow ⋅ Zichao Wang ⋅ Furong Huang ⋅ Ani Nenkova ⋅ Tong Sun

Abstract

Chat is not available.