Skip to yearly menu bar Skip to main content


Poster

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Liwei Jiang ⋅ Kavel Rao ⋅ Seungju Han ⋅ Allyson Ettinger ⋅ Faeze Brahman ⋅ Sachin Kumar ⋅ Niloofar Mireshghallah ⋅ Ximing Lu ⋅ Maarten Sap ⋅ Yejin Choi ⋅ Nouha Dziri
2024 Poster

Abstract

Video

Chat is not available.